FlightControl-Master / MOOSE

Mission Object Oriented Scripting Environment (MOOSE) for lua mission scripting design in DCS World
http://flightcontrol-master.github.io/MOOSE/
GNU General Public License v3.0
290 stars 96 forks source link

Ship Sunk = Crash DCS #1256

Closed Ironwulf2000 closed 4 years ago

Ironwulf2000 commented 4 years ago

Been having a few issues with missions running moose (including currently published SP/coop versions of "Through the Inferno" where a ship that is an AIRBASE is sunk, DCS crashes. It does not instantly happen on sinking but may be up to 20-30 minutes later, which suggests some sort of scheduled process happens that perhaps refreshes a database.

Occasionally - not always but perhaps this is because it crashed before writing it to the log, I get an error similar to the following.

`2020-01-10 11:11:17.359 INFO SCRIPTING: E: BASE00000({[1]=Cannot GetCoordinate,[Positionable]={[AirbaseZone]={[Vec2]={[y]=-32353.916403161,[x]=60243.521801727,},[ClassName]=ZONE_RADIUS,[Radius]=2500,},[AirbaseName]=Qeshm Defence Moscow,[Normandy]={[Sainte_Croix_sur_Mer]=Sainte-Croix-sur-Mer,[Lantheuil]=Lantheuil,[Meautis]=Meautis,[Evreux]=Evreux,[Rucqueville]=Rucqueville,[Needs_Oar_Point]=Needs Oar Point,[Deux_Jumeaux]=Deux Jumeaux,[Chippelle]=Chippelle,[Biniville]=Biniville,[Tangmere]=Tangmere,[Sainte_Laurent_sur_Mer]=Sainte-Laurent-sur-Mer,[Le_Molay]=Le Molay,[Azeville]=Azeville,[Lessay]=Lessay,[Cricqueville_en_Bessin]=Cricqueville-en-Bessin,[Funtington]=Funtington,[Chailey]=Chailey,[Carpiquet]=Carpiquet,[Cardonville]=Cardonville,[Picauville]=Picauville,[Longues_sur_Mer]=Longues-sur-Mer,[Cretteville]=Cretteville,[Saint_Pierre_du_Mont]=Saint Pierre du Mont,[Beuzeville]=Beuzeville,[Maupertus]=Maupertus,[Sommervieu]=Sommervieu,[Brucheville]=Brucheville,[Bazenville]=Bazenville,[Lignerolles]=Lignerolles,[Ford]=Ford,[Beny_sur_Mer]=Beny-sur-Mer,},[TerminalType]={[FighterAircraft]=244,[Shelter]=68,[HelicopterUsable]=216,[OpenMed]=72,[HelicopterOnly]=40,[OpenBig]=104,[Runway]=16,[OpenMedOrBig]=176,},[Nevada]={[Boulder_City_Airport]=Boulder City Airport,[Lincoln_County]=Lincoln County,[Mellan_Airstrip]=Mellan Airstrip,[Tonopah_Test_Range_Airfield]=Tonopah Test Range Airfield,[Pahute_Mesa_Airstrip]=Pahute Mesa Airstrip,[Nellis_AFB]=Nellis AFB,[Mesquite]=Mesquite,[Groom_Lake_AFB]=Groom Lake AFB,[North_Las_Vegas]=North Las Vegas,[McCarran_International_Airport]=McCarran International Airport,[Mina_Airport_3Q0]=Mina Airport 3Q0,[Echo_Bay]=Echo Bay,[Beatty_Airport]=Beatty Airport,[Tonopah_Airport]=Tonopah Airport,[Creech_AFB]=Creech AFB,[Henderson_Executive_Airport]=Henderson Executive Airport,[Laughlin_Airport]=Laughlin Airport,[Jean_Airport]=Jean Airport,},[PersianGulf]={[Al_Bateen_Airport]=Al-Bateen Airport,[Khasab]=Khasab,[Shiraz_International_Airport]=Shiraz International Airport,[Al_Minhad_AB]=Al Minhad AB,[Tunb_Kochak]=Tunb Kochak,[Sharjah_Intl]=Sharjah Intl,[Sir_Abu_Nuayr]=Sir Abu Nuayr,[Bandar_Abbas_Intl]=Bandar Abbas Intl,[Jiroft_Airport]=Jiroft Airport,[Al_Dhafra_AB]=Al Dhafra AB,[Kerman_Airport]=Kerman Airport,[Ras_Al_Khaimah_International_Airport]=Ras Al Khaimah International Airport,[Al_Maktoum_Intl]=Al Maktoum Intl,[Sas_Al_Nakheel_Airport]=Sas Al Nakheel Airport,[Havadarya]=Havadarya,[Bandar_e_Jask_airfield]=Bandar-e-Jask airfield,[Fujairah_Intl]=Fujairah Intl,[Dubai_Intl]=Dubai Intl,[Sirri_Island]=Sirri Island,[Abu_Dhabi_International_Airport]=Abu Dhabi International Airport,[Al_Ain_International_Airport]=Al Ain International Airport,[Lar_Airbase]=Lar Airbase,[Qeshm_Island]=Qeshm Island,[Lavan_Island_Airport]=Lavan Island Airport,[Tunb_Island_AFB]=Tunb Island AFB,[Abu_Musa_Island_Airport]=Abu Musa Island Airport,[Bandar_Lengeh]=Bandar Lengeh,[Kish_International_Airport]=Kish International Airport,[Liwa_Airbase]=Liwa Airbase,},[Caucasus]={[Gudauta]=Gudauta,[Mineralnye_Vody]=Mineralnye Vody,[Sukhumi_Babushara]=Sukhumi-Babushara,[Maykop_Khanskaya]=Maykop-Khanskaya,[Anapa_Vityazevo]=Anapa-Vityazevo,[Vaziani]=Vaziani,[Mozdok]=Mozdok,[Gelendzhik]=Gelendzhik,[Senaki_Kolkhi]=Senaki-Kolkhi,[Tbilisi_Lochini]=Tbilisi-Lochini,[Batumi]=Batumi,[Krasnodar_Center]=Krasnodar-Center,[Sochi_Adler]=Sochi-Adler,[Kobuleti]=Kobuleti,[Kutaisi]=Kutaisi,[Krasnodar_Pashkovsky]=Krasnodar-Pashkovsky,[Novorossiysk]=Novorossiysk,[Krymsk]=Krymsk,[Beslan]=Beslan,[Soganlug]=Soganlug,[Nalchik]=Nalchik,},[ClassName]=AIRBASE,[CategoryName]={[1]=Helipad,[2]=Ship,[0]=Airdrome,},},[Alive]=false,})

2020-01-10 11:11:17.359 INFO SCRIPTING: Error in timer function: [string "C:\Users\Iblog\AppData\Local\Temp\DCS.openbeta\/~mis000062D2.lua"]:19083: attempt to index local 'TargetCoordinate' (a nil value)`

I first noticed this issue on a mission I ran where it would crash almost reliably at the 4hr mark. Turned out this was when 2 AI naval units got close enough to shoot each other, and as the routes and speeds were static, of course it happened fairly reliably each time. I then looked at the coordinates in the first parameters of the error above, and noticed it was the same coordinates as a moscow cruiser, and it had been sunk about 15 minutes prior to the error.

When I removed the ship that was repeatedly sunk, it no longer crashed. We noted that other ships, when sunk, did not crash DCS in the same way but noticed the common element was the lack of a helipad. This would mean the ship was not regarded as an AIRBASE. I noted in the documentation for AIRBASE that it says the following:

IMPORTANT: ONE SHOULD NEVER SANATIZE these AIRBASE OBJECT REFERENCES! (make the AIRBASE object references nil).

I can only assume this is important as it might cause a crash?

I am wondering if there is some sort of scheduler that perhaps cleans up a database or refreshes it in some way, and when it discovers that the ship is gone, it sets the airbase object to nil, and hence crashes DCS.

I am not the best coder, so in thousands of lines of code it would take me a long time to work out how it all works, but I figure by posting here someone might know where to look.

funkyfranky commented 4 years ago

Thanks for the report! This is a tricky one.

Couple of things 1.) The error "Cannot Get Coordinate" can happen when a function tries to get a coordinate of a group/unit or whatever, but the unit is not alive any more. This causes an error in the script but no crash of DCS. 2.) Similar with the error "attempt to index local 'TargetCoordinate' (a nil value)`". That bug was fixed already IIRC. But this would also not cause a DCS crash. 3.) If one would sanitize an AIRBASE object, that would cause all sorts of scripting errors in MOOSE. But again, not a crash of DCS.

Depending on your mission it could be that at some point Moose wants to access the airbase/helipad of a ship that is already sunk. Again, it would cause the script to break and produce an error in the log. But not a DCS crash.

It is very difficult to cause a crash of DCS via scripting. Doable but difficult. For example if you create an infinite loop or create giant tables that cause a memory overflow.

DCS crashes are 99% of the time bugs in DCS. But of course, it could be that Moose triggers them. There were reports that DCS crashes, when an AI unit fires a certain missile or the like. Of course, Moose could trigger firing the missile and "cause" the crash but it would still be a DCS bug and not something that could/should be "fixed" in the Moose code.

I assume this is a similar thing here.

At least I have no idea where I should look for a bug in Moose. Best I can do is to create a simple mission with some ships and destroy them and see if I can reproduce the crash.

Ironwulf2000 commented 4 years ago

Hello Franky Thank you for the detailed response.

I think all the errors are symptomatic rather than the problem itself.

I am running a fairly recent 2.5.0pre1 version of MOOSE, I am not sure if your fix is supposed to be in that.

The santization of the AIRBASE object was just a guess, based off gut feel. Clearly it feels like something is trying to do something with this object, and it makes a call to DCS and that crashes DCS. As you said, MOOSE could just happen to be triggering a rarely used function in DCS, or one that doesnt have exception handlers for if it is passed a NULL object, or something.

I have worked around this problem by scripting the ship to go 'dead' and invincible (AI/Emitters off, immortal on) so it wont die, but it is effectively out of the mission, which is what they want. With this set on all helo-capable ships, I can run this mission for 20 hours. If however I miss one (as happened 2 days ago) and someone sinks it, it WILL crash within 20 minutes of the event. I also know if I take out the DISPATCHER and other moose elements and just use MIST to clone units for example... I can sink any damn thing I want, and no issue at all.

I dont doubt it is something in DCS. I equally dont doubt it is MOOSE triggering it. There has to be a way of finding the offending process - some sort of database refresh or something that happens on a schedule. Again this is anecdotal evidence but today we didnt crash TTI with it, but it was an older Ship model (Neutrashimy) and that mission does not use any sort of dispatcher that might look at airbases, but rather just uses SPAWN to generate things.

If you can think of a way I can turn up the logging - at least in functions that you suspect might be the cause, so we might have a better idea of what is going on, that would be great I get we can't fix DCS, but at least we can avoid, fix, or work around the trigger.

Any help would greatly be appreciated so we can get some naval action happening - those Viggen guys just want to sink stuff :)

Ironwulf2000 commented 4 years ago

What data would you like?

thebgpikester commented 4 years ago

OK, here's the thing. I think this is an important issue. But we aren't going to make progress with what we currently know. The basic steps to get to resolution is this: We have to reproduce on demand. locally, not on your server, with a minimum code use. We can't solve this until we can simply cause the crash with the least amount of things going on. I know it appeared like it was something to do with a ship going dead and a dispatcher but a proper "steps to reproduce" would include a miz with code that caused this to happen almost immediately. Currently we have a 4hr repro on a closed server we cannot do anything with that, despite you having tried things. So far you said Dispatcher...which one? AIr to air, air to ground, Task? Anything else known about what code is running, can it be stripped out and the miz supplied with just the dispatcher code? We have to actually do something and not discuss this otherwise a good opportunity for understanding the crash will just end up closed.

Hope that makes sense, happy to work on soemthing but I have nothing to investigate at this point.

thebgpikester commented 4 years ago

No further information provided or work has been done on this for 6 weeks, so closing.

Ironwulf2000 commented 4 years ago

Hello,

Sorry for the delay. I had been waiting for 2.5.6 to somewhat stabilise, and the problems with moose to be sorted before trying this all out again - plus the poor performance of 2.5.6 made most of my moose orientated missions run like a powerpoint presentation.

We had a crash overnight and it would seem there is now(?) some more debug information in moose, written to the logs. Please see below. Unit #348 is a Neutrashimy ship, which had been sunk.

`2020-04-17 20:38:17.989 INFO SCRIPTING: 40557( 20105)/E: AIRBASE04024.GetAirbaseCategory(ERROR: Cannot get category of airbase Unit #348 due to DCS 2.5.6 bug! Assuming it is an AIRDROME for now...) 2020-04-17 20:38:17.990 INFO SCRIPTING: 30078( 20109)/E: BASE00000.GetCoordinate({[1]=Cannot GetCoordinate,[Positionable]={[PersianGulf]={[Al_Bateen_Airport]=Al-Bateen Airport,[Khasab]=Khasab,[Shiraz_International_Airport]=Shiraz International Airport,[Al_Minhad_AB]=Al Minhad AB,[Ras_Al_Khaimah]=Ras Al Khaimah,[Tunb_Kochak]=Tunb Kochak,[Sharjah_Intl]=Sharjah Intl,[Sir_Abu_Nuayr]=Sir Abu Nuayr,[Liwa_Airbase]=Liwa Airbase,[Bandar_Abbas_Intl]=Bandar Abbas Intl,[Al_Dhafra_AB]=Al Dhafra AB,[Kerman_Airport]=Kerman Airport,[Al_Maktoum_Intl]=Al Maktoum Intl,[Sas_Al_Nakheel_Airport]=Sas Al Nakheel Airport,[Havadarya]=Havadarya,[Bandar_e_Jask_airfield]=Bandar-e-Jask airfield,[Fujairah_Intl]=Fujairah Intl,[Dubai_Intl]=Dubai Intl,[Sirri_Island]=Sirri Island,[Abu_Dhabi_International_Airport]=Abu Dhabi International Airport,[Al_Ain_International_Airport]=Al Ain International Airport,[Lar_Airbase]=Lar Airbase,[Qeshm_Island]=Qeshm Island,[Lavan_Island_Airport]=Lavan Island Airport,[Tunb_Island_AFB]=Tunb Island AFB,[Abu_Musa_Island_Airport]=Abu Musa Island Airport,[Bandar_Lengeh]=Bandar Lengeh,[Kish_International_Airport]=Kish International Airport,[Jiroft_Airport]=Jiroft Airport,},[ClassName]=AIRBASE,[CategoryName]={[1]=Helipad,[2]=Ship,[0]=Airdrome,},[TerminalType]={[FighterAircraft]=244,[Shelter]=68,[HelicopterUsable]=216,[OpenMed]=72,[HelicopterOnly]=40,[OpenBig]=104,[Runway]=16,[OpenMedOrBig]=176,},[AirbaseZone]={[Vec2]={[y]=301348.40625,[x]=-50261.06640625,},[ClassName]=ZONE_RADIUS,[Radius]=2500,},[AirbaseID]=-1126,[AirbaseName]=Unit #348,[Normandy]={[Sainte_Croix_sur_Mer]=Sainte-Croix-sur-Mer,[Lantheuil]=Lantheuil,[Meautis]=Meautis,[Evreux]=Evreux,[Deux_Jumeaux]=Deux Jumeaux,[Biniville]=Biniville,[Barville]=Barville,[Azeville]=Azeville,[Cricqueville_en_Bessin]=Cricqueville-en-Bessin,[Funtington]=Funtington,[Carpiquet]=Carpiquet,[Cardonville]=Cardonville,[Picauville]=Picauville,[Vrigny]=Vrigny,[Cretteville]=Cretteville,[Argentan]=Argentan,[Lignerolles]=Lignerolles,[Essay]=Essay,[Rucqueville]=Rucqueville,[Needs_Oar_Point]=Needs Oar Point,[Longues_sur_Mer]=Longues-sur-Mer,[Sainte_Laurent_sur_Mer]=Sainte-Laurent-sur-Mer,[Chailey]=Chailey,[Lessay]=Lessay,[Beny_sur_Mer]=Beny-sur-Mer,[Le_Molay]=Le Molay,[Bazenville]=Bazenville,[Sommervieu]=Sommervieu,[Ford]=Ford,[Saint_Pierre_du_Mont]=Saint Pierre du Mont,[Beuzeville]=Beuzeville,[Maupertus]=Maupertus,[Chippelle]=Chippelle,[Brucheville]=Brucheville,[Conches]=Conches,[Tangmere]=Tangmere,[Goulet]=Goulet,[Hauterive]=Hauterive,},[Nevada]={[Boulder_City_Airport]=Boulder City Airport,[Lincoln_County]=Lincoln County,[Tonopah_Test_Range_Airfield]=Tonopah Test Range Airfield,[Pahute_Mesa_Airstrip]=Pahute Mesa Airstrip,[Nellis_AFB]=Nellis AFB,[Mesquite]=Mesquite,[Groom_Lake_AFB]=Groom Lake AFB,[North_Las_Vegas]=North Las Vegas,[Beatty_Airport]=Beatty Airport,[Mina_Airport_3Q0]=Mina Airport 3Q0,[Echo_Bay]=Echo Bay,[McCarran_International_Airport]=McCarran International Airport,[Tonopah_Airport]=Tonopah Airport,[Creech_AFB]=Creech AFB,[Henderson_Executive_Airport]=Henderson Executive Airport,[Laughlin_Airport]=Laughlin Airport,[Jean_Airport]=Jean Airport,},[Caucasus]={[Gudauta]=Gudauta,[Mineralnye_Vody]=Mineralnye Vody,[Sukhumi_Babushara]=Sukhumi-Babushara,[Maykop_Khanskaya]=Maykop-Khanskaya,[Anapa_Vityazevo]=Anapa-Vityazevo,[Vaziani]=Vaziani,[Mozdok]=Mozdok,[Gelendzhik]=Gelendzhik,[Senaki_Kolkhi]=Senaki-Kolkhi,[Tbilisi_Lochini]=Tbilisi-Lochini,[Batumi]=Batumi,[Krasnodar_Center]=Krasnodar-Center,[Sochi_Adler]=Sochi-Adler,[Kobuleti]=Kobuleti,[Kutaisi]=Kutaisi,[Krasnodar_Pashkovsky]=Krasnodar-Pashkovsky,[Novorossiysk]=Novorossiysk,[Krymsk]=Krymsk,[Beslan]=Beslan,[Soganlug]=Soganlug,[Nalchik]=Nalchik,},},[Alive]=false,}) 2020-04-17 20:38:17.990 INFO SCRIPTING: Error in timer function: [string "C:\Users\gradmin\AppData\Local\Temp\2\DCS.openbeta3\/~mis00001772.lua"]:19467: attempt to index local 'TargetCoordinate' (a nil value) 2020-04-17 20:38:17.990 INFO SCRIPTING: stack traceback:

[string "C:\Users\gradmin\AppData\Local\Temp\2\DCS.openbeta3\/~mis00001772.lua"]:19467: in function 'Get2DDistance'
[string "C:\Users\gradmin\AppData\Local\Temp\2\DCS.openbeta3\/~mis00001772.lua"]:20109: in function 'GetClosestAirbase'
[string "C:\Users\gradmin\AppData\Local\Temp\2\DCS.openbeta3\/~mis00001772.lua"]:36868: in function <[string "C:\Users\gradmin\AppData\Local\Temp\2\DCS.openbeta3\/~mis00001772.lua"]:36862>
(tail call): ?
[C]: in function 'xpcall'
[string "C:\Users\gradmin\AppData\Local\Temp\2\DCS.openbeta3\/~mis00001772.lua"]:5834: in function <[string "C:\Users\gradmin\AppData\Local\Temp\2\DCS.openbeta3\/~mis00001772.lua"]:5770>

2020-04-17 20:38:25.189 INFO EDCORE: try to write dump information 2020-04-17 20:38:25.190 INFO EDCORE: # -------------- 20200417-203825 -------------- 2020-04-17 20:38:25.190 INFO EDCORE: DCS/2.5.6.47224 (x86_64; Windows NT 10.0.17763) 2020-04-17 20:38:25.191 INFO EDCORE: D:\Program Files\Eagle Dynamics\DCS World OpenBeta\bin\DCS.exe 2020-04-17 20:38:25.191 INFO EDCORE: # C0000005 ACCESS_VIOLATION at 3D0775E6 00:00000000 2020-04-17 20:38:25.192 INFO EDCORE: SymInit: Symbol-SearchPath: '.;D:\Program Files\Eagle Dynamics\DCS World OpenBeta;D:\Program Files\Eagle Dynamics\DCS World OpenBeta\bin;C:\Windows;C:\Windows\system32;SRVC:\websymbolshttp://msdl.microsoft.com/download/symbols;', symOptions: 530, UserName: 'gradmin' 2020-04-17 20:38:25.192 INFO EDCORE: OS-Version: 10.0.17763 () 0x110-0x3 2020-04-17 20:38:25.711 INFO EDCORE: 0x00000000004B75E6 (DCS): woLA::loadNewInputLayout + 0xDCCB6 2020-04-17 20:38:25.711 INFO EDCORE: 0x0000000000496663 (DCS): woLA::loadNewInputLayout + 0xBBD33 2020-04-17 20:38:25.712 INFO EDCORE: 0x000000000042A94F (DCS): woLA::loadNewInputLayout + 0x5001F 2020-04-17 20:38:25.712 INFO EDCORE: 0x0000000000003AE4 (World): wSimTrace::CommandsTraceDiscreteIsOn + 0x504 2020-04-17 20:38:25.712 INFO EDCORE: 0x0000000000003F8D (World): wSimCalendar::DoActionsUntil + 0x1FD 2020-04-17 20:38:25.713 INFO EDCORE: 0x00000000006E0D8F (DCS): woLA::loadNewInputLayout + 0x30645F 2020-04-17 20:38:25.713 INFO EDCORE: 0x00000000006F1BE3 (DCS): dDispatcher::InitDatabase + 0xDBB3 2020-04-17 20:38:25.714 INFO EDCORE: 0x00000000006C8264 (DCS): woLA::loadNewInputLayout + 0x2ED934 2020-04-17 20:38:25.714 INFO EDCORE: 0x00000000006C8629 (DCS): woLA::loadNewInputLayout + 0x2EDCF9 2020-04-17 20:38:25.715 INFO EDCORE: 0x00000000016969F6 (DCS): AmdPowerXpressRequestHighPerformance + 0xB099F2 2020-04-17 20:38:25.715 INFO EDCORE: 0x00000000008AAB2E (DCS): uiAsyncNet::onGameStop + 0x67DCE 2020-04-17 20:38:25.716 INFO EDCORE: 0x0000000000017974 (KERNEL32): BaseThreadInitThunk + 0x14 2020-04-17 20:38:25.716 INFO EDCORE: 0x000000000006A261 (ntdll): RtlUserThreadStart + 0x21 2020-04-17 20:38:25.852 INFO EDCORE: Minidump created.

`

thebgpikester commented 4 years ago

The crash is 8 seconds after the script errored, there's no "direct" relation. However it looks like your moose is out of date from that, as in its a mid 2.5.6 hotfix one because of the comments because that's the workaround text. For the script error I'll have a look at ships and GetCategory() as it might be an issue anyway. It's probably a Tasking heading to land something at the ship, because badly formatted tasks can crash DCS. I'm not sure what is tasking that though, have you got any ideas what code is running to do that?

funkyfranky commented 4 years ago

I actually did a quick test destroying a ship with a heliport. That caused no crash.

The Moose errors actually make sense if there is a dispatcher trying to attack the ship that is sunk and cannot get the coordinates any more. Or something similar in that line.

But this line 2020-04-17 20:38:25.191 INFO EDCORE: # C0000005 ACCESS_VIOLATION at 3D0775E6 00:00000000 is some C++ memory allocation in DCS that is buggy.

DCS produces some crash log/dump files. Did you post this on the DCS forums? Because only ED can analyse them and find where the error is.

thebgpikester commented 4 years ago

Yeah, I'm convinced it's nothing to do with the sinking, but something after that that might involve that unit. Does anyone know if a ship gets purged from the airbase database list in DCS after it's died? That could possibly, with some imagination, enable a situation to occur in the future, I reckon? So, using a lot of imagination, ship dies, coalition.getAirbases() still has it in there, or the moose database, because... reasons, then [slightly different stuff happens] until a task attempts to RTB/access to a non existent airbase and causes a proper crash. The actual script error as Frank says is completely expected and not causative, it's stopped doing whatever it tried lightyears before the crash. 8 seconds is years in scripting.

The problem is, without any steps to reproduce consistently and without having access to the ED symbols set, you can't do a debug to find out what the stack trace has in it or what was going on and it's all a matter of guessing and can never never produce a data-led finding. For the above hypothesis I have to:

  1. Take the only way to crash DCS I know, via a script
  2. Invent a way back to it being something to do with a ship being dead.
  3. Deliberately disregard there are no scripting calls logged before the crash
  4. Assume a script is the cause
  5. Assume MOOSE is the script

That methodology is exactly the same technique they used in the middle ages to identify and burn witches. No lies, it really is. :D

To get this solved you'll have to burn a lot of witches first, see which ones drown and which ones float etc. Takes a lot of time. The only difference between this and when I work on antivirus blue screens at work is

  1. I'd have the symbols set and could give probable cause using WinDbg.
  2. The customer is paying and expecting me to find the issue.

The only steps I can recommend are get reliable, simple, short, steps to reproduce. If those steps highlight anything close to a moose script you are running, sure, we found a DCS bug and maybe, just maybe there is a workaround (there is NO FIX with a DCS crash). Any other ideas?

zlinman commented 4 years ago

hmm...

Ironwulf2000 commented 4 years ago

I can categorically say, that if they ships are protected - I wrote triggers such that if their health drops below 50%, it will set them immortal (so players cant kill them), set them invisible (so AI wont attack them), and Set their ROE to hold (so they wont shoot) and turn their AI off (just to be sure)- effectively making them dead - the mission will run for 24 hours solid. So, it definitely has something to do with the ships being sunk.

Sinking the ship does not then guarantee an immediate crash. Sometimes it can be up to 20-30 minutes after the sinking, that it will crash. What it would appear to be - and I can only say this anecdotally - is when an AI unit - under control of MOOSE I believe - attempts to RTB. It would appear it searches for an airfield and that's what craps it out. I am not convinced its the dispatchers, as I know of servers that simply use the SPAWN() class and that's it, and they have had this issue. Why RTB? It would seem if the AI are shot down before they need to RTB for (reason), it never crashes as well.

I cannot explain why this is an issue for MOOSE only and not a non-MOOSE mission, except to say that perhaps MOOSE has a timer that looks at bases periodically for a unit in flight. I know for example GCI and others look at how far one is from an airbase when pursuing an enemy, for example.

funkyfranky commented 4 years ago

@Ironwulf2000 We would really like to help, but all you describe still sounds like a DCS bug. If you have a rather simple mission where we can reproduce the issue, we can work with that.

Otherwise, please report DCS crashes on the ED forum and linke the bug report here so we can support you. Have you done that as I suggested above?

Ironwulf2000 commented 4 years ago

Well, I'd love to report it, just need to know which DCS scripting function is being called, that causes it to crash.... because you know as well as I, that is the first thing they will ask. It seems clear to me it is some sort of unhandled exception in DCS that is actually crashing it. I am just trying to work out specifically is being called, and I need help with that.

thebgpikester commented 4 years ago

" just need to know which DCS scripting function is being called"

That's what we are asking for. We don't have a reproduceable example to say that the code process is >this<, so in effect, its not a needle in a haystack, because we don't have the haystack. Don't put any reliance in someone having the code in their head and magically guessing one of the lines in moose, because it's the size of the bible, and that bible is being read in an infinite amount of combinations by a set of private instructions, and what is more, we never wrote that bible. So we are just throwing out guesses without having any data to base that on. Capture and reproduce the crash in the simplest form. If you can't provide any way of guaranteeing the crash happens, then we only know what we know now... someone else has a crash, but all the rest of us do not. I'm not holding back anything, I'd set about solving this issue for myself in the exact same way. Reproduce, simplify and repeat until you have the answer to "which DCS scripting function is being called". Nothing else will happen until we are outside of guesswork territory.

Ironwulf2000 commented 4 years ago

Well what I have asked for, from the start, is a way to turn on some debugs in a specific area (and I was hoping given the information I have given someone would have an idea which 'area' to start in), so I can get more information. I do not know how to do that. So its all good to say capture it, but you already have everything that is 'captured' with the default level of logging.

We have the crash happen on at least 3 MP servers, one of which the admin didnt know why it was crashing, until I examined the logs - so more than likely there are others that have this happen, and just think it is a DCSism. Once I saw they were getting the same crash errors as me, and they either removed the ships or changed them to ones without helo decks, the crashing stopped. Didnt change the fact there's something wrong that crashes DCS, but you also should not have to do that.

So I can probably make you a mission that repeats the problem. I can't capture anything more than what I already have, without changing something.

thebgpikester commented 4 years ago

Well the TRACE functions https://flightcontrol-master.github.io/MOOSE_DOCS_DEVELOP/Documentation/Core.Base.html are mostly likely to explode a production servers logs, I don't know how to apply them to specifc areas, I dont think that can be done and if I thought it was useful I would have recommended it. But I'd never recommend starting Tracelevel logging on a production server without first simplifying the mission scripts. I've never had any luck with TRACE, I remove them from the demo missions immediately. But this is what you asked for, and there it is. There isn't anything else. They do not show what was running at the time in the way you might hope they do. For example, the code being run, so in the specific example of an imaginary RTB command being run on a Group somewhere, you wont see that, but if you want to run it on your servers, please go ahead, but also recognise it's not my advice.

Detailed logging comes after reproduction I am afraid. We take a fast reproducing issue, take away all the code that isn't the culprit and you are left with say one class, that does it, at one time, in a set circumstance. Once you have that, you can then be looking at a tenth of the possible code that was running and begin to zero in. At this point we have our own missions locally, short few lines of script and things are already clear.

I'm sorry that process seems unacceptable to you, you won't be the first person, I've done this type of troubleshooting for years now, on customers sites in front of them in many countries at the top tier team in my company and it's an industry standard method of troubleshooting complex issues. Capture the issue, remove all the compelxity into its most obviously simple form. It's exactly what ED want - one line of code that reproduces, no MOOSE, nothing else and it's the process I would use. You are welcome to our advice, but following it is entirely up to you. Let me know if there is something I can help with.

zlinman commented 4 years ago

Can you provide server addresses and/or a .miz file please, Ironwulf2000 ?

Ironwulf2000 commented 4 years ago

I think you read too much into what I said. Its not about what I find acceptable or not. I just asked for some help increasing the logging and it was either brushed aside or generally didnt answer the question. Now you have, and that's fine.

I could spend hours carving up the mission trying to excise the precise thing that is going wrong, only to find out there was something i could turn on that would tell me what was executing at the time.

I am in the process of cutting out particular parts of the mission that seem relevant into a new mission so I can test with that, as even before your response it had become strongly evident that there wasnt such a thing.

Might take a week or so, but I will get it.

thebgpikester commented 4 years ago

I just asked for some help increasing the logging and it was either brushed aside or generally didnt answer the question. OK I've grossly misunderstood the request in this ticket, i thought you wanted help tracking down a possible bug in MOOSE that could cause a crash. Instead all you wanted was to understand the method for additional logging. I'm quite surprised, we could have helped a lot more.