dhewm / dhewm3

dhewm 3 main repository
https://dhewm3.org/
GNU General Public License v3.0
1.81k stars 351 forks source link

Crash when loading RoE last level #138

Closed AMDmi3 closed 8 years ago

AMDmi3 commented 9 years ago

The game crashes during loading last level of RoE (hell).

----- Map Initialization -----
Map: game/hell
----- Game Map Init -----
collision data:
    60 models
 14627 vertices (342 KB)
 23764 edges (835 KB)
  8932 polygons (664 KB)
  2280 brushes (326 KB)
 29710 nodes (1392 KB)
 40900 polygon refs (639 KB)
 22241 brush refs (347 KB)
  7539 internal edges
   113 sharp edges
     0 contained polygons removed
     0 polygons merged
  4548 KB total memory used
142 msec to load collision data.
map bounds are (24309.0, 16762.0, 6904.0)
max clip sector is (759.7, 1047.6, 863.0)
  128 bytes passage memory used to build PVS
    0 msec to calculate PVS
   24 areas
   32 portals
    3 areas visible on average
   96 bytes PVS data
[Load AAS]
loading maps/game/hell.aas48
done.
[Load AAS]
loading maps/game/hell.aas96
done.
[Load AAS]
loading maps/game/hell.aas_guardian
[Load AAS]
loading maps/game/hell.aas_mancubus
done.
[Load AAS]
loading maps/game/hell.aas_sabaoth
[Load AAS]
loading maps/game/hell.aas_cyberdemon
Entering doom_main()
Exiting doom_main()
Spawning entities
loaded collision model models/items/belt_ammo/d3xp_belt_lo.lwo
loaded collision model models/items/medkit/d3xp_medkit_small.lwo
loaded collision model models/items/armor/d3xp_armor_lo.lwo
loaded collision model models/items/grenade_ammo/d3xp_grenade_lo.lwo
loaded collision model models/items/bfg_ammo/d3xp_bfg_ammo.lwo
loaded collision model models/items/cell_ammo/d3xp_cells_large.lwo
loaded collision model models/items/rocket_ammo/d3xp_rocket_small_lo.lwo
loaded collision model models/items/medkit/d3xp_medkit_large.lwo
loaded collision model models/items/shells_ammo/d3xp_shells_large_lo.lwo
loaded collision model models/items/rocket_ammo/d3xp_rocket_large_lo.lwo
loaded collision model models/gibs/lup_leg_pork.lwo
loaded collision model models/gibs/head_pork.lwo
loaded collision model models/gibs/pelvis_pork.lwo
loaded collision model models/gibs/left_waist_pork.lwo
loaded collision model models/mapobjects/hell/mancroom/vagpillar_half.lwo
loaded collision model models/mapobjects/hell/mancroom/vagpillartop.lwo
loaded collision model models/mapobjects/teleporter/telecolon.lwo
loaded collision model models/mapobjects/hellpieces/tapeworm_C_128.lwo
loaded collision model models/mapobjects/hell/torches/torso.lwo
loaded collision model models/mapobjects/hell/khell/kpillar.lwo
loaded collision model models/mapobjects/hell/mancroom/bars.lwo
loaded collision model models/mapobjects/hell/mancroom/bars2.lwo
loaded collision model models/mapobjects/hell/mancroom/bars3.lwo
loaded collision model models/items/powerups/d3xp_backpack.lwo
loaded collision model models/mapobjects/hell/skulls/pile1.lwo
loaded collision model models/patj/hellspike2.lwo
WARNING: file sound/monster_demon_tick.sndshd, line 61: unknown token 'sound/ed/tick/'
loaded collision model models/mapobjects/hell/mancroom/spike4.lwo
[New Thread 80891ac00 (LWP 101977)]
[New Thread 805416800 (LWP 101111 backgroundDownlo)]
[New Thread 805415400 (LWP 100996 SDLTimer)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 805415000 (LWP 101576)]
0x0000000000765327 in idHeap::MediumFree (this=0x805516000, ptr=0x820d7bd48) at /usr/work/ssd/portstrees/batchports-mem/games/dhewm3/work/dhewm3-1.4.0/neo/idlib/Heap.cpp:801
801     isInFreeList = p->largestFree >= MEDIUM_SMALLEST_SIZE;
(gdb) bt full
#0  0x0000000000765327 in idHeap::MediumFree (this=0x805516000, ptr=0x820d7bd48) at /usr/work/ssd/portstrees/batchports-mem/games/dhewm3/work/dhewm3-1.4.0/neo/idlib/Heap.cpp:801
        e = 0x820d7bd08
        p = 0x20d723c0
        isInFreeList = true
        prev = 0x820d7a270
        next = 0x820d7b390
#1  0x0000000000765216 in idHeap::Free (this=0x805516000, p=0x820d7bd48) at /usr/work/ssd/portstrees/batchports-mem/games/dhewm3/work/dhewm3-1.4.0/neo/idlib/Heap.cpp:302
No locals.
#2  0x000000000076625f in Mem_Free (ptr=0x820d7bd48) at /usr/work/ssd/portstrees/batchports-mem/games/dhewm3/work/dhewm3-1.4.0/neo/idlib/Heap.cpp:1101
No locals.
#3  0x0000000000467fbe in MA_Free (ma=0x86234d080) at /usr/work/ssd/portstrees/batchports-mem/games/dhewm3/work/dhewm3-1.4.0/neo/renderer/Model_ma.cpp:1071
        i = 1
        obj = 0x820d7bd48
        mesh = 0x820d7be50
        material = 0x820d97db8
        trans = 0x820d97ec0
        fileNode = 0x820cd2b68
        matNode = 0x833fd5780
#4  0x000000000043e7c5 in idRenderModelStatic::LoadMA(char const*) () at /usr/work/ssd/portstrees/batchports-mem/games/dhewm3/work/dhewm3-1.4.0/neo/idlib/Token.h:120
No symbol table info available.
#5  0x000000000043d900 in idRenderModelStatic::InitFromFile (this=0x8181f0220, fileName=0x81e566f70 "models/david/hell_h7.ma") at /usr/work/ssd/portstrees/batchports-mem/games/dhewm3/work/dhewm3-1.4.0/neo/renderer/Model.cpp:300
        loaded = false
        extension = {len = 2, data = 0x7fffffffc9a4 "ma", alloced = 20, baseBuffer = "ma\000\000*_\363\001\b\000\000\000\210\000\000\000\000\000\000"}
#6  0x000000000044c947 in idRenderModelManagerLocal::GetModel (this=0xa2e358 <localModelManager>, modelName=0x81e566f70 "models/david/hell_h7.ma", createIfNotFound=true) at /usr/work/ssd/portstrees/batchports-mem/games/dhewm3/work/dhewm3-1.4.0/neo/renderer/ModelManager.cpp:299
        canonical = {len = 23, data = 0x828c96390 "models/david/hell_h7.ma", alloced = 32, baseBuffer = "\000\000\000\000x=\340", '\000' <repeats 12 times>}
        extension = {len = 2, data = 0x7fffffffcb4c "ma", alloced = 20, baseBuffer = "ma\000\000\315\222;\027\002\000\000\000\274k\032[\242\367\257U"}
        key = 535
        model = 0x8181f0220
#7  0x000000000044d102 in idRenderModelManagerLocal::FindModel (this=0xa2e358 <localModelManager>, modelName=0x81e566f70 "models/david/hell_h7.ma") at /usr/work/ssd/portstrees/batchports-mem/games/dhewm3/work/dhewm3-1.4.0/neo/renderer/ModelManager.cpp:389
No locals.
#8  0x00000008170bc99a in idGameLocal::CacheDictionaryMedia(idDict const*) () from /usr/local/lib/dhewm3/d3xp.so
No symbol table info available.
#9  0x00000008170b93a4 in idGameLocal::SpawnMapEntities() () from /usr/local/lib/dhewm3/d3xp.so
No symbol table info available.
#10 0x00000008170b8270 in idGameLocal::MapPopulate() () from /usr/local/lib/dhewm3/d3xp.so
No symbol table info available.
#11 0x00000008170ba214 in idGameLocal::InitFromNewMap(char const*, idRenderWorld*, idSoundWorld*, bool, bool, int) () from /usr/local/lib/dhewm3/d3xp.so
No symbol table info available.
#12 0x000000000055d1bc in idSessionLocal::ExecuteMapChange(bool) ()
No symbol table info available.
#13 0x000000000055c4f2 in idSessionLocal::MoveToNewMap(char const*) ()
No symbol table info available.
#14 0x000000000055f44b in idSessionLocal::RunGameTic() ()
No symbol table info available.
#15 0x0000000000562a4d in idSessionLocal::Frame() ()
No symbol table info available.
#16 0x00000000004fc965 in idCommonLocal::Frame() ()
No symbol table info available.
#17 0x000000000069bce6 in main (argc=1, argv=0x7fffffffe978) at /usr/work/ssd/portstrees/batchports-mem/games/dhewm3/work/dhewm3-1.4.0/neo/sys/linux/main.cpp:300
No locals.
DanielGibson commented 8 years ago

I can't really reproduce this.

Does it happen if you start the level via console with map game/hell?

AMDmi3 commented 8 years ago

I'll check that a bit later.

While here, let me add that I could avoid the crash by removing MA_Free call from idRenderModelStatic::LoadMA. Of course that leads to memory leak, but it may also confirm that there could be a case of use-after-free or double free here. Also note that I use FreeBSD, so it may reveal problems which don't show themselves on Linux.

Yamagi commented 8 years ago

Hello, after a long a debugging session I've traced this down to the Model_MA parser. The long story is: Like most games Doom 3 has it's own memory management system, build upon the libc malloc(). This system works by allocating 'pages' trough malloc(), the pages are organized in a kind of linked list. Each list entry consists of a header with some metadata like the free space left, the previous entry, the next entry and so on. When the game parses .ma files, new memory is allocated through the system. For reasons not analyzed at this point the Model_MA parser has a buffer overflow or something like that when parsing models/david/hell_h7.ma. That file is used only at the last level of RoE, maps/hell.map. On FreeBSD the Model_MA parser overwrites the header of the next entry in the 'pages' list. When the games tries to free that page, the e->page field of the overwritten header contains garbage. It's dereferenced as a pointer, the memory address is invalid and a segfault is triggered. On Linux the problem is only visible with the Clang / LLVM address sanitizer or other tools. Model_MA writes into unallocated memory, the game doesn't crash by pure luck.

Regards, Yamagi

P.S.: A work around is to hack the file away by adding something like this to Model_ma.cpp line 1010

if (strcmp(fileName, "models/david/hell_h7.ma") == 0)
{
    idLib::common->Printf("Bla!!!!!\n");
    return NULL;
}
DanielGibson commented 8 years ago

Thanks a lot for debugging, Yamagi!

Another workaround that seems to work in this case is to make dhewm3 only use malloc()/free(), not its own heap. This can be done by adding #define USE_LIBC_MALLOC 1 at the top of neo/idlib/Heap.cpp. Not sure why that works, probably standard malloc() adds more padding, so nothing critical is overwritten.

Of course, the bug should be fixed properly (I hope to investigate more thoroughly next week, possibly this weekend with Yamagis help, if I can't figure it out before). With clang's AddressSanitizer I can at least reproduce it (and get the crash/ASAN error when the heap corruption happens and not in MediumFree(), when it's too late to find out what caused it. Unfortunately in -O0 builds all function arguments are reported as <optimized out> in gdb, even more than in optimized builds. I guess I should test a newer clang version, I'm still using 3.6).

motorsep commented 8 years ago

So, are you saying it's loading Maya models directly? (I know it can, to convert them into MD5) I don't recall seeing a single Maya model in Doom 3 o.O

Doom 3 is full of surprises :)

DanielGibson commented 8 years ago

I think this should be fixed in the latest git code. Could you test it? :-)

@motorsep yep, RoE has several .ma files, while the main game has none.

AMDmi3 commented 8 years ago

Fix confirmed, thank you very much! I was finally able to finish RoE.

This has also fixed another crash when fighting Maledict, if Hell loading problem is circumvented by removing MA_Free as mentioned. I assume it was caused by the same heap corruption.

Is 9950a57 enough to fix it? I want to backport it to FreeBSD port of dhewm3 (1.4.0) I maintain.

Also it would be nice to have 1.4.1 sooner :)

DanielGibson commented 8 years ago

yeah, that commit should be all that's needed to fix it (I only now noticed I committed a comment there I didn't want to commit.. whatever.)

I guess I could do a 1.4.1 release candidate in the next days, I don't have plans for any more changes for 1.4.1 anyway. There's #137, but I'm not sure that's easily fixable (and probably the behavior has been the same since doom3 was released).

Oh, and at some point I want to improve mod support, but that can wait for 1.4.2.

AMDmi3 commented 8 years ago

Just tested: 9950a57 alone fixes level loading, but Maledict fight crash is unrelated

Assertion failed: (b[1][0] - b[0][0] > a.b[1][0] - a.b[0][0] && b[1][1] - b[0][1] > a.b[1][1] - a.b[0][1] && b[1][2] - b[0][2] > a.b[1][2] - a.b[0][2]), function operator-, file /usr/work/ssd/portstrees/batchports-mem/games/dhewm3/work/dhewm3-1.4.0/neo/idlib/bv/Bounds.h, line 175.

and it was fixed by b03fc92. I'm backporting both commits for now. Thank you again!