CleverRaven / Cataclysm-DDA

Cataclysm - Dark Days Ahead. A turn-based survival game set in a post-apocalyptic world.
http://cataclysmdda.org
Other
10.59k stars 4.17k forks source link

Corrupted Save Files (was: Crashing when saving ) #19642

Closed Lazasar closed 7 years ago

Lazasar commented 7 years ago

Sometime around build 6009, maybe a build or two earlier, I am getting crashes when saving, and it corrupts o.0.0

codemime commented 7 years ago

Additional information is needed: corrupted/uncorrupted save file, preferably also a stack trace.

Regularitee commented 7 years ago

Don't have the save on hand since I started a brand new game after corruption borked by previous one, but I can at least confirm I, too, have been having save fail/corruptions since updating to the most recent build from a build 1-2 days prior.

Lazasar commented 7 years ago

Yeah. Now I am seeing corrupted saves even if the game doesn't crash while saving

codemime commented 7 years ago

What's your OS? Could you simply upload them (corrupted and uncorrupted ones)?

Lazasar commented 7 years ago

Windows 10. Here ya go. Both are corrupted, but one (Huagan) actually gives an error message before it crashes save.zip

bouchacha commented 7 years ago

Same here, it first started when I checked the map and crashed on 0.C-20128-gbcb7a7b (tiles) [Win 10] Since updating I get inconsistent stability; not sure what will make the game crash but usually when saving. save.zip

zakhad commented 7 years ago

I'm also getting corrupted saves, one of my saves did actually load but the map data failed to save and I had no mapped tiles everything was unknown.

mnacamura commented 7 years ago

Not sure if it's the same problem but loading an old save crashes on macOS Sierra. I tried to detect the cause and found that the latest build with omitting commit adef32a can load the save without crash.

Proxiehunter commented 7 years ago

As I posted here: http://smf.cataclysmdda.com/index.php?topic=13828.0 since update 6009 I have been unable to load old saves without the game crashing and while I can start a new game any attempt to save causes the game to crash and the save to corrupt as described here. Rolling back to update 6008 fixes the issue so it is definitely something in update 6009 that is causing this.

Xpyder commented 7 years ago

I also just had it crash on an autosave. Build 6017 from 12/5/2016 Windows Graphical Windows 7 Clean Unzip Brand new world Brand new character

I can upload a zip of this, including my settings if someone wants to test

image

EDIT: Interesting, I also get this error on that same world now when creating a new character, but didn't initially. It looks like the crash corrupted the world file in some way.

Xpyder commented 7 years ago

Some further testing, I reset the world and I saved once successfully, but stepping outside the shelter and trying to save caused a crash.

Will test without any world modifiers. (I usually run with more survival tools, some vehicle packs and a couple other things)

ussdefiant60 commented 7 years ago

Note: I also had this sort of thing happen on Win10 Graphical. Crash on saving, deleted my mobile base and also apparently my characters' knowledge of the map. Not sure how to track down the cause of the crash, since it doesn't seem to put crash logs anywhere.

Xpyder commented 7 years ago

It's definitely related to something in the world, doesn't happen all the time (approached swamp and town without triggering, but had it happen right after exiting shelter in empty field with no mobs around). I don't really know assembly, but I do have a debugger installed and here's what it spat out in case someone does know how to track it down.

Unhandled Exception at 0x00694055 in cataclysm-tiles.exe: 0xC0000005: Access violation reading location 0x0110e000

(0x0110e000 is 17883136 in dec) (memory values started at 00692D03)

image

Proxiehunter commented 7 years ago

And like I said it's clearly caused by something in update 6009. If I downgrade to 6008 it doesn't happen. Upgrading just one update higher to 6009 and the behavior starts.

321nick commented 7 years ago

i have similar issue, map data gone, got error when loading up the save:

image

and another error i also got, similar too the one @Xpyder posted a screenshot of.

Proxiehunter commented 7 years ago

Issue still occurring with build 6018 so I can't make use of any of those updates that got made.

Coolthulhu commented 7 years ago

Marking as priority. I'm getting this too. I'll make a debug build and get a stacktrace.

hmstanley commented 7 years ago

Yea, I had to revert to 6009, which doesn't have the crash. I'm getting when I attack ants. Every Time.

Coolthulhu commented 7 years ago

Would reproduce trivially in release build, but in debug builds it doesn't. This means it's something really serious.

Will build release+debug symbols and check there.

hmstanley commented 7 years ago

I have a save if you want it? All you need to do is attack one of the fire ants? And it dies.

Coolthulhu commented 7 years ago

Got it - it's something in auto_pickup::cRules::~cRules, invoked by npc::load. Looks like someone assumed autopickup rules will always be loaded in some specific order, which isn't true - they may be loaded at any time.

The whole bt is:

#0  0x00000005268a6a70 in ?? () from /usr/bin/msys-stdc++-6.dll
#1  0x000000010091d67d in ~cRules (__in_chrg=2, this=<optimized out>) at src/auto_pickup.h:48
#2  ~cRules (this=<optimized out>) at src/auto_pickup.h:48
#3  _Destroy<auto_pickup::cRules> (__pointer=<optimized out>) at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_construct.h:93
#4  std::_Destroy_aux<false>::__destroy<auto_pickup::cRules*> (__first=__first@entry=0x0, __last=0x6038eb010)
    at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_construct.h:103
#5  0x00000001009461a8 in _Destroy<auto_pickup::cRules*> (__last=<optimized out>, __first=0x0)
    at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_construct.h:126
#6  _Destroy<auto_pickup::cRules*, auto_pickup::cRules> (__last=<optimized out>, __first=0x0)
    at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_construct.h:151
#7  _M_erase_at_end (this=0x6038eafd8, __pos=0x0) at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_vector.h:1438
#8  std::vector<auto_pickup::cRules, std::allocator<auto_pickup::cRules> >::clear (this=0x6038eafd8)
    at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_vector.h:1212
#9  0x00000001008070c9 in auto_pickup::deserialize (this=0x6038e97a0, jsin=...) at src/auto_pickup.cpp:655
#10 0x000000010064626b in JsonIn::read (this=<optimized out>, j=...) at src/json.cpp:1411
#11 0x000000010076b1da in read<auto_pickup> (t=..., name="pickup_whitelist", this=0xffff9f58) at src/json.h:700
#12 npc_follower_rules::deserialize (this=<optimized out>, jsin=...) at src/savegame_json.cpp:872
#13 0x000000010064626b in JsonIn::read (this=<optimized out>, j=...) at src/json.cpp:1411
#14 0x000000010089d9e4 in JsonObject::read<npc_follower_rules> (this=this@entry=0xffffa228, name="rules", t=...) at src/json.h:700
#15 0x0000000100777301 in npc::load (this=this@entry=0x6038f7250, data=...) at src/savegame_json.cpp:1100
#16 0x0000000100777521 in npc::deserialize (this=this@entry=0x6038f7250, jsin=...) at src/savegame_json.cpp:999
#17 0x00000001007ee90d in overmap::unserialize (this=0x6fffe1e0010, fin=...) at src/savegame.cpp:631
#18 0x000000010042a3eb in operator() (__args#0=..., this=0xffffab00) at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/functional:2271
#19 read_from_file(std::string const&, std::function<void (std::istream&)> const&) (path="./save//NO_NPC_FOOD/o.-1.-8", reader=...)
    at src/cata_utility.cpp:434
#20 0x000000010042a5b5 in read_from_file_optional(std::string const&, std::function<void (std::istream&)> const&) (
    path="./save//NO_NPC_FOOD/o.-1.-8", reader=...) at src/cata_utility.cpp:467
#21 0x0000000100418bed in overmap::open (this=this@entry=0x6fffe1e0010) at src/overmap.cpp:4366
#22 0x0000000100419b3c in overmap::overmap (this=0x6fffe1e0010, x=-1, y=-8) at src/overmap.cpp:1152
#23 0x00000001007929a0 in overmapbuffer::get (this=this@entry=0x100b8ca00 <overmap_buffer>, x=x@entry=-1, y=y@entry=-8)
    at src/overmapbuffer.cpp:64
#24 0x0000000100792b4a in overmapbuffer::get_existing (this=this@entry=0x100b8ca00 <overmap_buffer>, x=x@entry=-1, y=y@entry=-8)
    at src/overmapbuffer.cpp:162
#25 0x00000001007942c2 in overmapbuffer::get_overmaps_near (this=this@entry=0x100b8ca00 <overmap_buffer>, location=...,
    radius=radius@entry=4) at src/overmapbuffer.cpp:657
#26 0x0000000100794d31 in overmapbuffer::get_npcs_near (this=this@entry=0x100b8ca00 <overmap_buffer>, x=-288, y=-2646, z=0,
    radius=radius@entry=4) at src/overmapbuffer.cpp:676
#27 0x0000000100794ea7 in overmapbuffer::get_npcs_near_player (this=0x100b8ca00 <overmap_buffer>, radius=radius@entry=4)
    at src/overmapbuffer.cpp:641
#28 0x0000000100669fe5 in game::load_mission_npcs (this=this@entry=0x6001366c0) at src/game.cpp:973
#29 0x000000010069023b in game::load (this=0x6001366c0, worldname="NO_NPC_FOOD", name="#R2F0dHN1") at src/game.cpp:3681
#30 0x000000010069050c in game::load (this=<optimized out>, world="NO_NPC_FOOD") at src/game.cpp:3632
#31 0x00000001009ba1c8 in main (argc=<optimized out>, argv=<optimized out>) at src/main.cpp:479
codemime commented 7 years ago

I presume it's a consequence of reading the corrupted files (they're all truncated somewhere near that rules). However, the real cause is elsewhere.

@hmstanley Your files would be useful.

ussdefiant60 commented 7 years ago

i personally blame #19626 for this bout of crashiness. Can't really suggest a simple revert since that one apparently fixed build errors, though...

Coolthulhu commented 7 years ago

Weird: crashes in the line:

vRules[(bChar) ? CHARACTER_TAB : GLOBAL_TAB].clear();

Which looks incredibly tame. Meaning it must be some memory corruption. The closest place to expect memory corruption is NPC code in the overmap, since overmaps were changed recently and that part of the code wasn't really bulletproof before the changes.

hmstanley commented 7 years ago

Roscoe-3.zip Here you go codemine:

[ "dda", "no_npc_food", "PK_REBALANCE", "realguns", "blazemod", "Tanks" ]

hmstanley commented 7 years ago

as I noted..just drive the car to the upper town, get out and attack a fireant.. it will most likely crash.

codemime commented 7 years ago

@hmstanley Thanks. I tried it on a release build, but didn't get the crash. Windows is probably more sensitive to whatever causes it.

codemime commented 7 years ago

Weird: crashes in the line:

You were experiencing consequences of reading a corrupted save file. That's also pretty serious, but looks like another issue. Something makes the game produce corrupted save files (and crash during normal playing).

EDIT: I'm suspicious of #19579.

@Coolthulhu Could you try the files uploaded by hmstanley?

hmstanley commented 7 years ago

Yea, I was running windows SDL tiles build

Coolthulhu commented 7 years ago

You were experiencing consequences of reading a corrupted save file.

No, that's not it. Corrupted save file can't cause memory corruption on its own. Plus, the file isn't even corrupted - it actually happens in proper saves too.

mutability commented 7 years ago

If anyone has a vanilla save with simple reproduction steps, I can take a look. I can't see how the autopickup fix itself could cause this, it must be tickling a latent bug elsewhere.

(hmstanley's save doesn't seem to be vanilla, given the error spam on load; and I couldn't find any fire ants)

Xpyder commented 7 years ago

Simple reproduction steps Start a clean world Start a random character spawn 5 npcs save

This seems to reliably crash build 6017

You can probably get away with only spawning 1, but I haven't tested whether it crashes if they spawn out of sight (i.e. outside the starting shelter before you step outside)

mutability commented 7 years ago

OK I can reproduce on linux, curses, release build with xpyder's instructions.

valgrind says:

==14718== Use of uninitialised value of size 8
==14718==    at 0x6DB365: _S_ref (array:53)
==14718==    by 0x6DB365: operator[] (array:185)
==14718==    by 0x6DB365: auto_pickup::serialize(JsonOut&) const (auto_pickup.cpp:640)
==14718==    by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718==    by 0x74E2B6: member<auto_pickup> (json.h:534)
==14718==    by 0x74E2B6: npc_follower_rules::serialize(JsonOut&) const (savegame_json.cpp:847)
==14718==    by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718==    by 0x753652: member<npc_follower_rules> (json.h:534)
==14718==    by 0x753652: npc::store(JsonOut&) const (savegame_json.cpp:1167)
==14718==    by 0x7537D9: npc::serialize(JsonOut&) const (savegame_json.cpp:1119)
==14718==    by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718==    by 0x6F4AA7: overmap::serialize(std::ostream&) const (savegame.cpp:917)
==14718==    by 0x64F1BF: overmap::save() const (overmap.cpp:4394)
==14718==    by 0x71D73A: overmapbuffer::save() (overmapbuffer.cpp:111)
==14718==    by 0x59759F: game::save_maps() (game.cpp:3811)
==14718==    by 0x59CFF5: game::save() (game.cpp:3851)
==14718== 
==14718== Invalid read of size 1
==14718==    at 0x8AD6E9: JsonOut::write(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (json.cpp:1671)
==14718==    by 0x6DB3A3: member<std::__cxx11::basic_string<char> > (json.h:534)
==14718==    by 0x6DB3A3: auto_pickup::serialize(JsonOut&) const (auto_pickup.cpp:643)
==14718==    by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718==    by 0x74E2B6: member<auto_pickup> (json.h:534)
==14718==    by 0x74E2B6: npc_follower_rules::serialize(JsonOut&) const (savegame_json.cpp:847)
==14718==    by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718==    by 0x753652: member<npc_follower_rules> (json.h:534)
==14718==    by 0x753652: npc::store(JsonOut&) const (savegame_json.cpp:1167)
==14718==    by 0x7537D9: npc::serialize(JsonOut&) const (savegame_json.cpp:1119)
==14718==    by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718==    by 0x6F4AA7: overmap::serialize(std::ostream&) const (savegame.cpp:917)
==14718==    by 0x64F1BF: overmap::save() const (overmap.cpp:4394)
==14718==    by 0x71D73A: overmapbuffer::save() (overmapbuffer.cpp:111)
==14718==    by 0x59759F: game::save_maps() (game.cpp:3811)
==14718==  Address 0xb7e000 is not stack'd, malloc'd or (recently) free'd

Looking further..

Coolthulhu commented 7 years ago

In gdb, it looks as if autopickup::vRules was a single element array (it should be 2 element). Though it could be just me not reading it right.

(gdb) p vRules
$9 = {_M_elems = {std::vector of length 0, capacity 0, std::vector of length 0, capacity 0}}

vRules is used as vRules[1].clear().

mutability commented 7 years ago

vrules is a std::array<2>, you're seeing the two vectors it contains there

mutability commented 7 years ago

@Coolthulhu try adding a virtual dtor to the auto_pickup class; it might have fixed it here. (I am on a plane and can't really test this much)

edit: nope, didn't fix it actually

Coolthulhu commented 7 years ago

Why virtual?

mutability commented 7 years ago
    fprintf(stderr, "%p serialize\n", this);
    fprintf(stderr, "I have %u vectors\n", (unsigned) vRules.size());

    json.start_array();

    auto index = ( bChar ) ? CHARACTER_TAB : GLOBAL_TAB;
    fprintf(stderr, "using vector #%d\n", (int)index);
0x1110130 serialize
I have 2 vectors
using vector #208

.. er, what??

mutability commented 7 years ago

@Coolthulhu it was a hunch that the wrong 'this' pointer offset was being used somewhere. But it inherits a virtual dtor anyway so.. Red herring.

There is something very weird going on with that bChar ternary expression though, see above.

mutability commented 7 years ago

I am going to guess that this is an uninitialized 'bool' (it is not initialized in the ctor) invoking undefined behavior (maybe the compiler is generating code that uses the raw value of the bool, expecting it to always be 1 or 0)

codemime commented 7 years ago

OK, I've managed to reproduce it too (following the same instructions). Reverting #19626 prevents the crash for me.

mutability commented 7 years ago

Initializing bChar in the ctor seems to have fixed it (fingers crossed)

Let me try to get a PR together

Coolthulhu commented 7 years ago

maybe the compiler is generating code that uses the raw value of the bool, expecting it to always be 1 or 0

That would make sense - while debugging with gdb here, the bChar was 255. Using index offsets instead of branching is a very common optimization.

codemime commented 7 years ago

Initializing bChar in the ctor seems to have fixed it (fingers crossed)

Yes, looks like it's pinned. It doesn't crash anymore.

Coolthulhu commented 7 years ago

Is there a compiler switch to warn about this kind of shit? It would save us a lot of trouble in the future.

codemime commented 7 years ago

@mutability Well done!

Is there a compiler switch to warn about this kind of shit? It would save us a lot of trouble in the future.

According to the Standard, we have UB here:

47) Using a bool value in ways described by this International Standard as "undefined,"
such as by examining the value of an uninitialized automatic object, might cause it to 
behave as if it is neither true nor false.

Wuninitialized / Wmaybe-uninitialized won't help here, and Weffc++ / Qeffc++ spit out a ton of false positives (since they insist on using of initialization lists).

Xpyder commented 7 years ago

It still happens with current build 6020. How long does it take for the merge to land in http://dev.narc.ro/cataclysm/jenkins-latest/Windows/Tiles/ ?

Coolthulhu commented 7 years ago

6020 was the last one before the merge. Wait for 6021 http://ci.narc.ro/view/Cataclysm-DDA/job/Cataclysm-Matrix/6021/

Xpyder commented 7 years ago

Looks like it works, though windows curses and linux x64 tiles seems to have failed to build

Coolthulhu commented 7 years ago

Linux x64 tiles is turned off, not failed. Windows console is failing since 6000.