QodotPlugin / qodot-plugin

(LEGACY) Quake .map support for Godot 3.x
MIT License
960 stars 70 forks source link

libmap crash under Arch and Manjaro #86

Closed Shfty closed 4 years ago

Shfty commented 4 years ago

From @varkatope:

Starting program: /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64 qodot.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff7fcd700 (LWP 9956)]
Godot Engine v3.2.stable.official - https://godotengine.org
[Detaching after fork from child process 9957]
[Detaching after fork from child process 9974]
[New Thread 0x7fffe9709700 (LWP 9991)]
[New Thread 0x7fffe8f08700 (LWP 9992)]
[New Thread 0x7fffe3fff700 (LWP 9993)]
[New Thread 0x7fffe37fe700 (LWP 9994)]
[New Thread 0x7fffe2ffd700 (LWP 9995)]
[New Thread 0x7fffe27fc700 (LWP 9996)]
[New Thread 0x7fffe1ffb700 (LWP 9997)]
[New Thread 0x7fffe17fa700 (LWP 9998)]
[New Thread 0x7fffe0ff9700 (LWP 9999)]
[New Thread 0x7fffbffff700 (LWP 10000)]
[New Thread 0x7fffbf7fe700 (LWP 10001)]
[New Thread 0x7fffbeffd700 (LWP 10002)]
[New Thread 0x7fffbe7fc700 (LWP 10003)]
[New Thread 0x7fffbdffb700 (LWP 10004)]
[New Thread 0x7fffbd7fa700 (LWP 10005)]
[New Thread 0x7fffbcff9700 (LWP 10006)]
[New Thread 0x7ffff6623700 (LWP 10007)]
OpenGL ES 3.0 Renderer: AMD RAVEN (DRM 3.36.0, 5.5.2-1-MANJARO, LLVM 9.0.1)
[New Thread 0x7fffe8066700 (LWP 10008)]
[New Thread 0x7fffe0577700 (LWP 10009)]

Building /home/varkatope/GameDevelopment/qodot-example/maps/metal-arch.map

build_map
remove_children
Done in 0.000311 sec

load_map

Thread 1 "qodot.x86_64" received signal SIGSEGV, Segmentation fault.
0x00007ffff4023951 in map_parser_load (map_file=0x411cb30 "/home/varkatope/GameDevelopment/qodot-example/maps/metal-arch.map") at libqodot/libmap/src/c/map_parser.c:115
115     comment = false;

libmap appears to be segfaulting a couple of lines after map_data_reset, reset_current_face, reset_current_brush and reset_current_entity are called by map_parser_load after a load is invoked by libqodot.

Given that we have another user (@MissLav) who's tested the new native core under Manjaro, I'm not sure what's going on here. I've stepped through the aforementioned functions with a debugger, and all of the dynamic memory logic that could cause a segfault is being skipped as expected of a first-run scenario.

Shfty commented 4 years ago

@varkatope I'm afraid the best advice I can offer at this point is to step through map_parser_load yourself with a debugger and try to figure out what's going on.

Unless there's some information I lack here (ex. the crashes happening after several builds rather than during any build), there's not much I can do to try and figure out what the issue is without access to an OS environment where it can be reproduced.

varkatope commented 4 years ago

@Shfty Ah, that's a bummer, but understandable. Hard to troubleshoot a problem when it's not in front of you. I wonder if it could reproduce on a virtual machine. I'll see if I can take a look at map_parser_load later and come up with anything useful. I'm at least learning a thing or two about debuggers, which is something. In the meantime, here's the full trace including an extra bit about libc at the end when continuing through the segfault once. The installed version of glibc is 2.30.

(gdb) run qodot.x86_64
Starting program: /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64 qodot.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff7fcd700 (LWP 1555)]
Godot Engine v3.2.stable.official - https://godotengine.org
[Detaching after fork from child process 1556]
[Detaching after fork from child process 1573]
[New Thread 0x7fffe9709700 (LWP 1590)]
[New Thread 0x7fffe8f08700 (LWP 1591)]
[New Thread 0x7fffdbfff700 (LWP 1592)]
[New Thread 0x7fffdb7fe700 (LWP 1593)]
[New Thread 0x7fffd2ffd700 (LWP 1594)]
[New Thread 0x7fffdaffd700 (LWP 1595)]
[New Thread 0x7fffda7fc700 (LWP 1596)]
[New Thread 0x7fffd9ffb700 (LWP 1597)]
[New Thread 0x7fffd97fa700 (LWP 1598)]
[New Thread 0x7fffd8ff9700 (LWP 1599)]
[New Thread 0x7fffd3fff700 (LWP 1600)]
[New Thread 0x7fffd37fe700 (LWP 1601)]
[New Thread 0x7fffd27fc700 (LWP 1602)]
[New Thread 0x7fffd1ffb700 (LWP 1603)]
[New Thread 0x7fffd17fa700 (LWP 1604)]
[New Thread 0x7fffd0ff9700 (LWP 1605)]
[New Thread 0x7ffff6623700 (LWP 1606)]
OpenGL ES 3.0 Renderer: AMD RAVEN (DRM 3.36.0, 5.5.2-1-MANJARO, LLVM 9.0.1)
[New Thread 0x7fffe8066700 (LWP 1607)]
[New Thread 0x7fffd8577700 (LWP 1608)]

Building /home/varkatope/GameDevelopment/qodot-example/maps/metal-arch.map

build_map
remove_children
Done in 0.000411 sec

load_map

Thread 1 "qodot.x86_64" received signal SIGSEGV, Segmentation fault.
0x00007ffff4023951 in map_parser_load (
    map_file=0x411d430 "/home/varkatope/GameDevelopment/qodot-example/maps/metal-arch.map")
    at libqodot/libmap/src/c/map_parser.c:115
115     comment = false;
(gdb) c
Continuing.
handle_crash: Program crashed with signal 11
Dumping the backtrace. Please include this when reporting the bug on https://github.com/godotengine/godot/issues
[Detaching after vfork from child process 1617]
[1] /usr/lib/libc.so.6(+0x3bfb0) [0x7ffff7df6fb0] (??:0)
[Detaching after vfork from child process 1619]
[2] /home/varkatope/Desktop/qodot_debug_test/libmap.so(map_parser_load+0x58) [0x7ffff4023951] (??:0)
[Detaching after vfork from child process 1621]
[3] /home/varkatope/Desktop/qodot_debug_test/libqodot.so(qodot_load_map+0x5e) [0x7ffff65de2a1] (??:0)
[Detaching after vfork from child process 1623]
[4] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x1b63257] (<artificial>:?)
[Detaching after vfork from child process 1625]
[5] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0xa1e49f] (??:?)
[Detaching after vfork from child process 1627]
[6] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x91e53d] (??:?)
[Detaching after vfork from child process 1629]
[7] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x1b8c339] (<artificial>:?)
[Detaching after vfork from child process 1631]
[8] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x1b9b048] (<artificial>:?)
[Detaching after vfork from child process 1633]
[9] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0xa1e49f] (??:?)
[Detaching after vfork from child process 1635]
[10] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0xa1e235] (??:?)
[Detaching after vfork from child process 1637]
[11] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0xae2c68] (<artificial>:?)
[Detaching after vfork from child process 1639]
[12] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0xb0d7ce] (??:?)
[Detaching after vfork from child process 1641]
[13] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0xa1e560] (??:?)
[Detaching after vfork from child process 1643]
[14] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x91e53d] (??:?)
[Detaching after vfork from child process 1645]
[15] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x1b8c339] (<artificial>:?)
[Detaching after vfork from child process 1647]
[16] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x1b9b048] (<artificial>:?)
[Detaching after vfork from child process 1649]
[17] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0xa1e49f] (??:?)
[Detaching after vfork from child process 1651]
[18] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x91e53d] (??:?)
[Detaching after vfork from child process 1653]
[19] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x1b8c339] (<artificial>:?)
[Detaching after vfork from child process 1655]
[20] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x1b9a190] (<artificial>:?)
[Detaching after vfork from child process 1657]
[21] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x1b9ab21] (<artificial>:?)
[Detaching after vfork from child process 1659]
[22] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x1ae9f4f] (<artificial>:?)
[Detaching after vfork from child process 1661]
[23] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0xa1e560] (??:?)
[Detaching after vfork from child process 1663]
[24] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0xa3f3c9] (??:?)
[Detaching after vfork from child process 1665]
[25] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0xa40340] (??:?)
[Detaching after vfork from child process 1667]
[26] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x16e43bc] (??:?)
[Detaching after vfork from child process 1669]
[27] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x681bfc] (??:?)
[Detaching after vfork from child process 1671]
[28] /usr/lib/libc.so.6(__libc_start_main+0xf3) [0x7ffff7de2153] (??:0)
[Detaching after vfork from child process 1673]
[29] /home/varkatope/Desktop/qodot_debug_test/qodot.x86_64() [0x68ed1e] (??:?)
-- END OF BACKTRACE --

Thread 1 "qodot.x86_64" received signal SIGABRT, Aborted.
0x00007ffff7df6f25 in raise () from /usr/lib/libc.so.6

And by the way, in case it's in any way helpful, this happens with any type of build, continues to happen with 1.6.2, and my laptop is a Huawei Matebook D 14" (which isn't even listed on their site anymore) with AMD Ryzen 5 2500U / Vega 8 integrated graphics, 8 GB DDR4 RAM, using latest Mesa open source drivers.

Shfty commented 4 years ago

@varkatope It may be reproducible under a VM. I'll spin one up with the latest Manjaro and see what happens.

Also, Sir Space Anchor on the discord is having the same issue on the following hardware under Arch:

So hardware; 
CPU: AMDRyzen 7 1700
RAM: 16gb
GPU: RX Vega 56
STORAGE: NVME.2 SSD 250gb Samsung Evo

Plug-in is 1.6.2(?) Literally the latest build.
Godot 3.2

The through-line here seems to be Ryzen/Vega setups, though granted a sample size of 2 isn't much to go on. Need to do some research.

Shfty commented 4 years ago

Initial research indicates that the Ryzen 7 1700 line was affected by a hardware issue that causes random segfaults under load on Linux systems. I've asked Sir Space Anchor to run the following test for the sake of ruling it out as a possible cause of the crash: https://www.reddit.com/r/Amd/comments/8b7jnb/when_the_segfault_comes_for_you/dx4veni/

I've yet to see any mention of the Ryzen 5 2500U with regard to that issue, but it may be useful for you to run the test as well @varkatope

Shfty commented 4 years ago

Scratch that- I've managed to repro in a Manjaro VM. Will get a debug workflow going and see what's going on.

Shfty commented 4 years ago

Fixed in the latest commit, released with 1.6.3

varkatope commented 4 years ago

@Shfty It works! Thanks for persevering. Just a small thing, but the way the 1.6.3 zip is created doesn't allow you to cleanly import it using the Asset Lib/Import function. There's a top folder too many, basically. Just a little thing.

Also, out of curiosity, I did run that script for a while, but no segfaults.

Cheers!

Shfty commented 4 years ago

Thank goodness for that! Glad to hear it's working.

And good point- I'll update the release zip.