C7-Game / Prototype

An early-stage, open-source 4X strategy game
https://c7-game.github.io/
MIT License
32 stars 9 forks source link

Methodology for Debugging and Tracing #209

Open pcen opened 2 years ago

pcen commented 2 years ago

While developing I am encountering runtime exceptions such as:

*** Processing turn 15 ***
Moving barbarian at [15, 57] (Plains on Plains) to [15, 59] (Plains on Plains)
Moving barbarian at [50, 64] (Grassland on Grassland) to [49, 65] (Forest on Grassland)
Moving barbarian at [16, 22] (Grassland on Grassland) to [14, 22] (Plains on Plains)
Moving barbarian at [63, 35] (Forest on Grassland) to [63, 37] (Hills on Grassland)
Moving barbarian at [13, 45] (Grassland on Grassland) to [14, 46] (Forest on Grassland)
Moving barbarian at [70, 54] (Forest on Plains) to [71, 53] (Forest on Plains)
Moving barbarian at [13, 45] (Grassland on Grassland) to [15, 45] (Forest on Grassland)
Moving barbarian at [16, 30] (Forest on Grassland) to [16, 32] (Grassland on Grassland)
Moving barbarian at [16, 50] (Forest on Grassland) to [17, 51] (Grassland on Grassland)
Moving barbarian at [17, 51] (Grassland on Grassland) to [15, 51] (Grassland on Grassland)
-> Begin Athens turn
Set random exploration AI for Chariot with 2 movement points and 3 hit points, guid = 95b35a93-d959-4009-bb25-aee601cb9d7c
-> Begin Philadelphia turn
Set random exploration AI for Chariot with 2 movement points and 3 hit points, guid = df5afcca-10c2-4284-9ded-bb781231f64b
-> Begin Amsterdam turn
Set random exploration AI for Chariot with 2 movement points and 3 hit points, guid = 3419d188-6427-430d-8643-8413f17ffe63
libc++abi: terminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argument

=================================================================
    Native Crash Reporting
=================================================================
Got a abrt while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================

=================================================================
    Native stacktrace:
=================================================================
    0x103109bb5 - /Applications/Godot.app/Contents/MacOS//Godot :
    0x1030fd77e - /Applications/Godot.app/Contents/MacOS//Godot :
    0x1031091af - /Applications/Godot.app/Contents/MacOS//Godot :
    0x7fff209c7d7d - /usr/lib/system/libsystem_platform.dylib : _sigtramp
    0x7ff400000400 - Unknown
    0x7fff208d7406 - /usr/lib/system/libsystem_c.dylib : abort
    0x7fff20945ef2 - /usr/lib/libc++abi.dylib : abort_message
    0x7fff209375e5 - /usr/lib/libc++abi.dylib : _ZL28demangling_terminate_handlerv
    0x7fff20830595 - /usr/lib/libobjc.A.dylib : _ZL15_objc_terminatev
    0x7fff20945307 - /usr/lib/libc++abi.dylib : _ZSt11__terminatePFvvE
    0x7fff20947beb - /usr/lib/libc++abi.dylib : __cxa_get_exception_ptr
    0x7fff20947bb2 - /usr/lib/libc++abi.dylib : _ZN10__cxxabiv1L22exception_cleanup_funcE19_Unwind_Reason_CodeP17_Unwind_Exception
    0x7fff209227bb - /usr/lib/libc++.1.dylib : _ZNSt3__120__throw_system_errorEiPKc
    0x7fff2091952b - /usr/lib/libc++.1.dylib : _ZNSt3__115recursive_mutex4lockEv
    0x1056c0b55 - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK5Curve19get_bake_resolutionEv
    0x1056c46bb - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK5Curve19get_bake_resolutionEv
    0x1056f8c50 - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK5Curve19get_bake_resolutionEv
    0x10543303b - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK10CanvasItem15get_canvas_itemEv
    0x10544d2e4 - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK10CanvasItem10_is_on_topEv
    0x103abe2c8 - /Applications/Godot.app/Contents/MacOS//Godot : _ZN7ClassDB7creatorI11_GodotSharpEEP6Objectv
    0x12c7b2fa2 - Unknown
    0x12c7b2c23 - Unknown
    0x12b2ee58b - Unknown
    0x12a6e910f - Unknown
    0x11aadc3a0 - Unknown
    0x103114210 - /Applications/Godot.app/Contents/MacOS//Godot :
    0x10324b93d - /Applications/Godot.app/Contents/MacOS//Godot :
    0x10324e82f - /Applications/Godot.app/Contents/MacOS//Godot :
    0x103b375c0 - /Applications/Godot.app/Contents/MacOS//Godot : _ZN7ClassDB7creatorI11_GodotSharpEEP6Objectv
    0x103b3493c - /Applications/Godot.app/Contents/MacOS//Godot : _ZN7ClassDB7creatorI11_GodotSharpEEP6Objectv
    0x103a8b565 - /Applications/Godot.app/Contents/MacOS//Godot : _ZN7ClassDB7creatorI17MobileVRInterfaceEEP6Objectv
    0x105430687 - /Applications/Godot.app/Contents/MacOS//Godot : _ZN19AudioStreamPlayer2D11_mix_audiosEPv
    0x10606fd08 - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK6Object22can_translate_messagesEv
    0x10604f4d1 - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK6Object15get_instance_idEv
    0x1060422eb - /Applications/Godot.app/Contents/MacOS//Godot : _ZN5Image19copy_internals_fromERK3RefIS_E
    0x1060425e2 - /Applications/Godot.app/Contents/MacOS//Godot : _ZN5Image19copy_internals_fromERK3RefIS_E
    0x104f0cb2a - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK4Node14is_inside_treeEv
    0x103379f6c - /Applications/Godot.app/Contents/MacOS//Godot : _ZN9DirAccess15_create_builtinI12DirAccessOSXEEPS_v
    0x103344ba9 - /Applications/Godot.app/Contents/MacOS//Godot : _ZN16DefaultAllocator5allocEm
    0x10334b2eb - /Applications/Godot.app/Contents/MacOS//Godot : _ZN9DirAccess15_create_builtinI12DirAccessOSXEEPS_v
    0x7fff2099df3d - /usr/lib/system/libdyld.dylib : start
    0x1 - Unknown

=================================================================
    Telemetry Dumper:
=================================================================
Pkilling 0x123145422159872x from 0x4585463296x
Pkilling 0x123145424269312x from 0x4585463296x
Pkilling 0x123145427988480x from 0x4585463296x
Could not exec mono-hang-watchdog, expected on path '/Applications/Godot.app/Contents/MacOS/../Resources/GodotSharp/Mono/etc/../bin/mono-hang-watchdog' (errno 2)
Entering thread summarizer pause from 0x4585463296x
Finished thread summarizer pause from 0x4585463296x.
Failed to create breadcrumb file (null)/crash_hash_0x2e13da62d
debugger-agent: Unable to listen on 7

Waiting for dumping threads to resume

=================================================================
    External Debugger Dump:
=================================================================

=================================================================
    Basic Fault Address Reporting
=================================================================
Memory around native instruction pointer (0x7fff2095392e):0x7fff2095391e  ff ff c3 90 90 90 b8 48 01 00 02 49 89 ca 0f 05  .......H...I....
0x7fff2095392e  73 08 48 89 c7 e9 a1 a1 ff ff c3 90 90 90 b8 53  s.H............S
0x7fff2095393e  00 00 02 49 89 ca 0f 05 73 08 48 89 c7 e9 89 a1  ...I....s.H.....
0x7fff2095394e  ff ff c3 90 90 90 b8 83 01 00 02 49 89 ca 0f 05  ...........I....

=================================================================
    Managed Stacktrace:
=================================================================
      at <unknown> <0xffffffff>
      at Godot.NativeCalls:godot_icall_5_272 <0x00151>
      at Godot.CanvasItem:DrawString <0x00312>
      at CityLayer:drawObject <0x048da>
      at LooseView:_Draw <0x00e6e>
      at System.Object:runtime_invoke_void__this__ <0x0017f>
=================================================================
[1]    29803 abort      Godot
(base)

We should work towards documenting setting up C# debugging and better logging to debug these issues. We may also want to investigate setting up test scenes for testing specific scenarios from a well defined game state.

QuintillusCFC commented 2 years ago

I'm guessing from the logs that this causes a crash?

I have occasionally seen crashes, but never got this detailed of logs. I thought maybe it was just my system, or debugging setup, and had never established a pattern.

... and I literally just got the "Godot Engine has stopped working" pop-up while typing this message. So it's not in direct reaction to the player doing anything, at least in my case. 45 turns in, had been alt-tabbed away for a few minutes.

Unfortunately, the error message you have doesn't point to a smoking gun. A Mono library, native system runtime, or mutex lock? Not a lot to go on...

We should nonetheless collect these messages and try to see if there's a pattern, and maybe set up some detailed logging if need be to try to figure out if there's anything consistent in our code that triggers it.

QuintillusCFC commented 2 years ago

So I selected, "debug with Visual Studio 2019", and it told me the error was:

Unhandled exception at 0x0000000002610BC9 in Godot 3.4.2.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.

The binary (Godot 3.4.2.exe) was built without debug symbols.

Takeaways:

  1. I should bump the version of Godot Rider is using. Maybe there's a fix in 3.4.3 or 3.4.4.
  2. It all sounds kind of Godot-y but it could be related to how we're using Godot.
pcen commented 2 years ago

Yeah the debug message I got crashed Godot, and like you said it doesn't give a lot to go on... (std::1::system_error: recursive_mutex lock failed: Invalid argument thrown could be thrown in a lot of places in Godot), and I'm not sure what I'm looking at with these stack traces yet. Managed trace looks like somewhere in the C# bindings there's a `System.Object:runtime_invoke_voidthis__`, and the native trace is what just looks a trace of some render code.

Similar to the one you just got, the main issue is that when I get a crash after running for 5 or 10 minutes, it's difficult (almost impossible?) to reproduce it if it isn't an exception thrown from C# code, and happens in regular code paths (in my case, just moving units around on the map which usual seems fine)

I agree it would be useful to try to document crashes until we have better logging / debugging

pcen commented 2 years ago

Got another crash at end of turn, before AI opponents start

Selected unit: Catapult with 0 movement points and 3 hit points, guid = 106cd5d6-f70b-4254-b73a-778955c5f0f3 at [26, 52] (Plains on Plains)
Could not find button rename
Started a timer for blinking

=================================================================
    Native Crash Reporting
=================================================================
Got a segv while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================

=================================================================
    Native stacktrace:
=================================================================
    0x10d88fbb5 - /Applications/Godot.app/Contents/MacOS//Godot :
    0x10d88377e - /Applications/Godot.app/Contents/MacOS//Godot :
    0x10d896e72 - /Applications/Godot.app/Contents/MacOS//Godot :
    0x7fff209c7d7d - /usr/lib/system/libsystem_platform.dylib : _sigtramp
    0xffff - Unknown
    0x10fe497ff - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK5Curve19get_bake_resolutionEv
    0x10fe5531d - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK5Curve19get_bake_resolutionEv
    0x10e23388b - /Applications/Godot.app/Contents/MacOS//Godot : _ZN7ClassDB7creatorI11_GodotSharpEEP6Objectv
    0x12e9c070f - Unknown
    0x134dbaccf - Unknown
    0x12524b3a0 - Unknown
    0x10d89a210 - /Applications/Godot.app/Contents/MacOS//Godot :
    0x10d9d193d - /Applications/Godot.app/Contents/MacOS//Godot :
    0x10d9d482f - /Applications/Godot.app/Contents/MacOS//Godot :
    0x10e2bd5c0 - /Applications/Godot.app/Contents/MacOS//Godot : _ZN7ClassDB7creatorI11_GodotSharpEEP6Objectv
    0x10e2ba93c - /Applications/Godot.app/Contents/MacOS//Godot : _ZN7ClassDB7creatorI11_GodotSharpEEP6Objectv
    0x10e211565 - /Applications/Godot.app/Contents/MacOS//Godot : _ZN7ClassDB7creatorI17MobileVRInterfaceEEP6Objectv
    0x10fbb6687 - /Applications/Godot.app/Contents/MacOS//Godot : _ZN19AudioStreamPlayer2D11_mix_audiosEPv
    0x1107f5d08 - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK6Object22can_translate_messagesEv
    0x1107d54d1 - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK6Object15get_instance_idEv
    0x1107c82eb - /Applications/Godot.app/Contents/MacOS//Godot : _ZN5Image19copy_internals_fromERK3RefIS_E
    0x1107c85e2 - /Applications/Godot.app/Contents/MacOS//Godot : _ZN5Image19copy_internals_fromERK3RefIS_E
    0x10f692b2a - /Applications/Godot.app/Contents/MacOS//Godot : _ZNK4Node14is_inside_treeEv
    0x10dafff6c - /Applications/Godot.app/Contents/MacOS//Godot : _ZN9DirAccess15_create_builtinI12DirAccessOSXEEPS_v
    0x10dacaba9 - /Applications/Godot.app/Contents/MacOS//Godot : _ZN16DefaultAllocator5allocEm
    0x10dad12eb - /Applications/Godot.app/Contents/MacOS//Godot : _ZN9DirAccess15_create_builtinI12DirAccessOSXEEPS_v
    0x7fff2099df3d - /usr/lib/system/libdyld.dylib : start
    0x1 - Unknown

=================================================================
    Telemetry Dumper:
=================================================================
Pkilling 0x123145474011136x from 0x4695334400x
Pkilling 0x123145476120576x from 0x4695334400x
Pkilling 0x123145479839744x from 0x4695334400x
Could not exec mono-hang-watchdog, expected on path '/Applications/Godot.app/Contents/MacOS/../Resources/GodotSharp/Mono/etc/../bin/mono-hang-watchdog' (errno 2)
Entering thread summarizer pause from 0x4695334400x
Finished thread summarizer pause from 0x4695334400x.
Failed to create breadcrumb file (null)/crash_hash_0x2db3da541
debugger-agent: Unable to listen on 7

Waiting for dumping threads to resume

=================================================================
    External Debugger Dump:
=================================================================

=================================================================
    Basic Fault Address Reporting
=================================================================
Memory around native instruction pointer (0x10fe4378c):0x10fe4377c  0f 1f 40 00 48 83 c6 10 48 8b 36 48 39 ce 74 35  ..@.H...H.6H9.t5
0x10fe4378c  39 56 30 77 ef 73 1d 48 83 c6 08 eb eb 0f 1f 80  9V0w.s.H........
0x10fe4379c  00 00 00 00 48 83 c0 10 48 8b 00 48 39 c8 0f 84  ....H...H..H9...
0x10fe437ac  dd 01 00 00 39 50 30 77 eb 0f 83 ea 01 00 00 48  ....9P0w.......H

=================================================================
    Managed Stacktrace:
=================================================================
      at <unknown> <0xffffffff>
      at Godot.NativeCalls:godot_icall_1_24 <0x000ce>
      at Godot.DynamicFont:SetFontData <0x00102>
      at Godot.DynamicFont:set_FontData <0x0008a>
      at CityLayer:drawObject <0x04e72>
      at LooseView:_Draw <0x00e6e>
      at System.Object:runtime_invoke_void__this__ <0x0017f>
=================================================================
[1]    41082 abort      Godot
QuintillusCFC commented 2 years ago

Got another one, it had been idle for over an hour this time:

Unhandled exception at 0x0000000002610BC9 in Godot 3.4.2.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.

I'm guessing you are getting more data because of running it from the command line?

That Managed Stacktrace might give a hint... CityLayer:drawObject, dealing with a font. Hmmm.... the CityLayer does manipulate fonts to get a font of the appropriate size for that area. I hadn't noticed that part of the log last night. And looking at that code, it's horribly inefficient, it reloads the font from the bundled resource every time it draws. The "at Godot.DynamicFont:set_FontData" suggests that it is blowing up in the middle line of this section of the code (MapView.cs, lines 865-867 currently):


            DynamicFont smallFont = new DynamicFont();
            smallFont.FontData = ResourceLoader.Load("res://Fonts/NotoSans-Regular.ttf") as DynamicFontData;
            smallFont.Size = 11;

(or the equivalent for midSizedFont farther down)

It runs this thousands of times due to the inefficiency, and it would be the same every time. But it would run it more frequently as more cities are build and visible on the map, possibly increasing the chance of a crash if there's a bug in a mutex lock when it's loading fonts? My crash after an hour was with one city visible and four overall on the map, so it would have been running relatively infrequently.

So that's a theory at least! There's literally no reason to load the fonts over and over again, and that whole block of code is Wild West New to Godot Aztec style. It could use a general refactor, but moving the DynamicFonts out to be loaded once, just like cityTexture already is, would be an obvious first step.

pcen commented 2 years ago

Yes I think running from the command line gives a full trace.

That's interesting because in my initial crash managed trace:

=================================================================
    Managed Stacktrace:
=================================================================
      at <unknown> <0xffffffff>
      at Godot.NativeCalls:godot_icall_5_272 <0x00151>
      at Godot.CanvasItem:DrawString <0x00312>
      at CityLayer:drawObject <0x048da>
      at LooseView:_Draw <0x00e6e>
      at System.Object:runtime_invoke_void__this__ <0x0017f>
=================================================================

It crashes at the DrawString call which uses that font as well, so it could also be related to the font