Closed Lazasar closed 7 years ago
Additional information is needed: corrupted/uncorrupted save file, preferably also a stack trace.
Don't have the save on hand since I started a brand new game after corruption borked by previous one, but I can at least confirm I, too, have been having save fail/corruptions since updating to the most recent build from a build 1-2 days prior.
Yeah. Now I am seeing corrupted saves even if the game doesn't crash while saving
What's your OS? Could you simply upload them (corrupted and uncorrupted ones)?
Windows 10. Here ya go. Both are corrupted, but one (Huagan) actually gives an error message before it crashes save.zip
Same here, it first started when I checked the map and crashed on 0.C-20128-gbcb7a7b (tiles) [Win 10] Since updating I get inconsistent stability; not sure what will make the game crash but usually when saving. save.zip
I'm also getting corrupted saves, one of my saves did actually load but the map data failed to save and I had no mapped tiles everything was unknown.
Not sure if it's the same problem but loading an old save crashes on macOS Sierra. I tried to detect the cause and found that the latest build with omitting commit adef32a can load the save without crash.
As I posted here: http://smf.cataclysmdda.com/index.php?topic=13828.0 since update 6009 I have been unable to load old saves without the game crashing and while I can start a new game any attempt to save causes the game to crash and the save to corrupt as described here. Rolling back to update 6008 fixes the issue so it is definitely something in update 6009 that is causing this.
I also just had it crash on an autosave. Build 6017 from 12/5/2016 Windows Graphical Windows 7 Clean Unzip Brand new world Brand new character
I can upload a zip of this, including my settings if someone wants to test
EDIT: Interesting, I also get this error on that same world now when creating a new character, but didn't initially. It looks like the crash corrupted the world file in some way.
Some further testing, I reset the world and I saved once successfully, but stepping outside the shelter and trying to save caused a crash.
Will test without any world modifiers. (I usually run with more survival tools, some vehicle packs and a couple other things)
Note: I also had this sort of thing happen on Win10 Graphical. Crash on saving, deleted my mobile base and also apparently my characters' knowledge of the map. Not sure how to track down the cause of the crash, since it doesn't seem to put crash logs anywhere.
It's definitely related to something in the world, doesn't happen all the time (approached swamp and town without triggering, but had it happen right after exiting shelter in empty field with no mobs around). I don't really know assembly, but I do have a debugger installed and here's what it spat out in case someone does know how to track it down.
Unhandled Exception at 0x00694055 in cataclysm-tiles.exe: 0xC0000005: Access violation reading location 0x0110e000
(0x0110e000 is 17883136 in dec) (memory values started at 00692D03)
And like I said it's clearly caused by something in update 6009. If I downgrade to 6008 it doesn't happen. Upgrading just one update higher to 6009 and the behavior starts.
i have similar issue, map data gone, got error when loading up the save:
and another error i also got, similar too the one @Xpyder posted a screenshot of.
Issue still occurring with build 6018 so I can't make use of any of those updates that got made.
Marking as priority. I'm getting this too. I'll make a debug build and get a stacktrace.
Yea, I had to revert to 6009, which doesn't have the crash. I'm getting when I attack ants. Every Time.
Would reproduce trivially in release build, but in debug builds it doesn't. This means it's something really serious.
Will build release+debug symbols and check there.
I have a save if you want it? All you need to do is attack one of the fire ants? And it dies.
Got it - it's something in auto_pickup::cRules::~cRules
, invoked by npc::load
.
Looks like someone assumed autopickup rules will always be loaded in some specific order, which isn't true - they may be loaded at any time.
The whole bt
is:
#0 0x00000005268a6a70 in ?? () from /usr/bin/msys-stdc++-6.dll
#1 0x000000010091d67d in ~cRules (__in_chrg=2, this=<optimized out>) at src/auto_pickup.h:48
#2 ~cRules (this=<optimized out>) at src/auto_pickup.h:48
#3 _Destroy<auto_pickup::cRules> (__pointer=<optimized out>) at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_construct.h:93
#4 std::_Destroy_aux<false>::__destroy<auto_pickup::cRules*> (__first=__first@entry=0x0, __last=0x6038eb010)
at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_construct.h:103
#5 0x00000001009461a8 in _Destroy<auto_pickup::cRules*> (__last=<optimized out>, __first=0x0)
at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_construct.h:126
#6 _Destroy<auto_pickup::cRules*, auto_pickup::cRules> (__last=<optimized out>, __first=0x0)
at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_construct.h:151
#7 _M_erase_at_end (this=0x6038eafd8, __pos=0x0) at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_vector.h:1438
#8 std::vector<auto_pickup::cRules, std::allocator<auto_pickup::cRules> >::clear (this=0x6038eafd8)
at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/bits/stl_vector.h:1212
#9 0x00000001008070c9 in auto_pickup::deserialize (this=0x6038e97a0, jsin=...) at src/auto_pickup.cpp:655
#10 0x000000010064626b in JsonIn::read (this=<optimized out>, j=...) at src/json.cpp:1411
#11 0x000000010076b1da in read<auto_pickup> (t=..., name="pickup_whitelist", this=0xffff9f58) at src/json.h:700
#12 npc_follower_rules::deserialize (this=<optimized out>, jsin=...) at src/savegame_json.cpp:872
#13 0x000000010064626b in JsonIn::read (this=<optimized out>, j=...) at src/json.cpp:1411
#14 0x000000010089d9e4 in JsonObject::read<npc_follower_rules> (this=this@entry=0xffffa228, name="rules", t=...) at src/json.h:700
#15 0x0000000100777301 in npc::load (this=this@entry=0x6038f7250, data=...) at src/savegame_json.cpp:1100
#16 0x0000000100777521 in npc::deserialize (this=this@entry=0x6038f7250, jsin=...) at src/savegame_json.cpp:999
#17 0x00000001007ee90d in overmap::unserialize (this=0x6fffe1e0010, fin=...) at src/savegame.cpp:631
#18 0x000000010042a3eb in operator() (__args#0=..., this=0xffffab00) at /usr/lib/gcc/x86_64-pc-msys/5.3.0/include/c++/functional:2271
#19 read_from_file(std::string const&, std::function<void (std::istream&)> const&) (path="./save//NO_NPC_FOOD/o.-1.-8", reader=...)
at src/cata_utility.cpp:434
#20 0x000000010042a5b5 in read_from_file_optional(std::string const&, std::function<void (std::istream&)> const&) (
path="./save//NO_NPC_FOOD/o.-1.-8", reader=...) at src/cata_utility.cpp:467
#21 0x0000000100418bed in overmap::open (this=this@entry=0x6fffe1e0010) at src/overmap.cpp:4366
#22 0x0000000100419b3c in overmap::overmap (this=0x6fffe1e0010, x=-1, y=-8) at src/overmap.cpp:1152
#23 0x00000001007929a0 in overmapbuffer::get (this=this@entry=0x100b8ca00 <overmap_buffer>, x=x@entry=-1, y=y@entry=-8)
at src/overmapbuffer.cpp:64
#24 0x0000000100792b4a in overmapbuffer::get_existing (this=this@entry=0x100b8ca00 <overmap_buffer>, x=x@entry=-1, y=y@entry=-8)
at src/overmapbuffer.cpp:162
#25 0x00000001007942c2 in overmapbuffer::get_overmaps_near (this=this@entry=0x100b8ca00 <overmap_buffer>, location=...,
radius=radius@entry=4) at src/overmapbuffer.cpp:657
#26 0x0000000100794d31 in overmapbuffer::get_npcs_near (this=this@entry=0x100b8ca00 <overmap_buffer>, x=-288, y=-2646, z=0,
radius=radius@entry=4) at src/overmapbuffer.cpp:676
#27 0x0000000100794ea7 in overmapbuffer::get_npcs_near_player (this=0x100b8ca00 <overmap_buffer>, radius=radius@entry=4)
at src/overmapbuffer.cpp:641
#28 0x0000000100669fe5 in game::load_mission_npcs (this=this@entry=0x6001366c0) at src/game.cpp:973
#29 0x000000010069023b in game::load (this=0x6001366c0, worldname="NO_NPC_FOOD", name="#R2F0dHN1") at src/game.cpp:3681
#30 0x000000010069050c in game::load (this=<optimized out>, world="NO_NPC_FOOD") at src/game.cpp:3632
#31 0x00000001009ba1c8 in main (argc=<optimized out>, argv=<optimized out>) at src/main.cpp:479
I presume it's a consequence of reading the corrupted files (they're all truncated somewhere near that rules). However, the real cause is elsewhere.
@hmstanley Your files would be useful.
i personally blame #19626 for this bout of crashiness. Can't really suggest a simple revert since that one apparently fixed build errors, though...
Weird: crashes in the line:
vRules[(bChar) ? CHARACTER_TAB : GLOBAL_TAB].clear();
Which looks incredibly tame. Meaning it must be some memory corruption. The closest place to expect memory corruption is NPC code in the overmap, since overmaps were changed recently and that part of the code wasn't really bulletproof before the changes.
Roscoe-3.zip Here you go codemine:
[ "dda", "no_npc_food", "PK_REBALANCE", "realguns", "blazemod", "Tanks" ]
as I noted..just drive the car to the upper town, get out and attack a fireant.. it will most likely crash.
@hmstanley Thanks. I tried it on a release build, but didn't get the crash. Windows is probably more sensitive to whatever causes it.
Weird: crashes in the line:
You were experiencing consequences of reading a corrupted save file. That's also pretty serious, but looks like another issue. Something makes the game produce corrupted save files (and crash during normal playing).
EDIT: I'm suspicious of #19579.
@Coolthulhu Could you try the files uploaded by hmstanley?
Yea, I was running windows SDL tiles build
You were experiencing consequences of reading a corrupted save file.
No, that's not it. Corrupted save file can't cause memory corruption on its own. Plus, the file isn't even corrupted - it actually happens in proper saves too.
If anyone has a vanilla save with simple reproduction steps, I can take a look. I can't see how the autopickup fix itself could cause this, it must be tickling a latent bug elsewhere.
(hmstanley's save doesn't seem to be vanilla, given the error spam on load; and I couldn't find any fire ants)
Simple reproduction steps Start a clean world Start a random character spawn 5 npcs save
This seems to reliably crash build 6017
You can probably get away with only spawning 1, but I haven't tested whether it crashes if they spawn out of sight (i.e. outside the starting shelter before you step outside)
OK I can reproduce on linux, curses, release build with xpyder's instructions.
valgrind says:
==14718== Use of uninitialised value of size 8
==14718== at 0x6DB365: _S_ref (array:53)
==14718== by 0x6DB365: operator[] (array:185)
==14718== by 0x6DB365: auto_pickup::serialize(JsonOut&) const (auto_pickup.cpp:640)
==14718== by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718== by 0x74E2B6: member<auto_pickup> (json.h:534)
==14718== by 0x74E2B6: npc_follower_rules::serialize(JsonOut&) const (savegame_json.cpp:847)
==14718== by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718== by 0x753652: member<npc_follower_rules> (json.h:534)
==14718== by 0x753652: npc::store(JsonOut&) const (savegame_json.cpp:1167)
==14718== by 0x7537D9: npc::serialize(JsonOut&) const (savegame_json.cpp:1119)
==14718== by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718== by 0x6F4AA7: overmap::serialize(std::ostream&) const (savegame.cpp:917)
==14718== by 0x64F1BF: overmap::save() const (overmap.cpp:4394)
==14718== by 0x71D73A: overmapbuffer::save() (overmapbuffer.cpp:111)
==14718== by 0x59759F: game::save_maps() (game.cpp:3811)
==14718== by 0x59CFF5: game::save() (game.cpp:3851)
==14718==
==14718== Invalid read of size 1
==14718== at 0x8AD6E9: JsonOut::write(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (json.cpp:1671)
==14718== by 0x6DB3A3: member<std::__cxx11::basic_string<char> > (json.h:534)
==14718== by 0x6DB3A3: auto_pickup::serialize(JsonOut&) const (auto_pickup.cpp:643)
==14718== by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718== by 0x74E2B6: member<auto_pickup> (json.h:534)
==14718== by 0x74E2B6: npc_follower_rules::serialize(JsonOut&) const (savegame_json.cpp:847)
==14718== by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718== by 0x753652: member<npc_follower_rules> (json.h:534)
==14718== by 0x753652: npc::store(JsonOut&) const (savegame_json.cpp:1167)
==14718== by 0x7537D9: npc::serialize(JsonOut&) const (savegame_json.cpp:1119)
==14718== by 0x8AD806: JsonOut::write(JsonSerializer const&) (json.cpp:1729)
==14718== by 0x6F4AA7: overmap::serialize(std::ostream&) const (savegame.cpp:917)
==14718== by 0x64F1BF: overmap::save() const (overmap.cpp:4394)
==14718== by 0x71D73A: overmapbuffer::save() (overmapbuffer.cpp:111)
==14718== by 0x59759F: game::save_maps() (game.cpp:3811)
==14718== Address 0xb7e000 is not stack'd, malloc'd or (recently) free'd
Looking further..
In gdb, it looks as if autopickup::vRules
was a single element array (it should be 2 element). Though it could be just me not reading it right.
(gdb) p vRules
$9 = {_M_elems = {std::vector of length 0, capacity 0, std::vector of length 0, capacity 0}}
vRules
is used as vRules[1].clear()
.
vrules is a std::array<2>, you're seeing the two vectors it contains there
@Coolthulhu try adding a virtual dtor to the auto_pickup class; it might have fixed it here. (I am on a plane and can't really test this much)
edit: nope, didn't fix it actually
Why virtual?
fprintf(stderr, "%p serialize\n", this);
fprintf(stderr, "I have %u vectors\n", (unsigned) vRules.size());
json.start_array();
auto index = ( bChar ) ? CHARACTER_TAB : GLOBAL_TAB;
fprintf(stderr, "using vector #%d\n", (int)index);
0x1110130 serialize
I have 2 vectors
using vector #208
.. er, what??
@Coolthulhu it was a hunch that the wrong 'this' pointer offset was being used somewhere. But it inherits a virtual dtor anyway so.. Red herring.
There is something very weird going on with that bChar ternary expression though, see above.
I am going to guess that this is an uninitialized 'bool' (it is not initialized in the ctor) invoking undefined behavior (maybe the compiler is generating code that uses the raw value of the bool, expecting it to always be 1 or 0)
OK, I've managed to reproduce it too (following the same instructions). Reverting #19626 prevents the crash for me.
Initializing bChar in the ctor seems to have fixed it (fingers crossed)
Let me try to get a PR together
maybe the compiler is generating code that uses the raw value of the bool, expecting it to always be 1 or 0
That would make sense - while debugging with gdb
here, the bChar
was 255.
Using index offsets instead of branching is a very common optimization.
Initializing bChar in the ctor seems to have fixed it (fingers crossed)
Yes, looks like it's pinned. It doesn't crash anymore.
Is there a compiler switch to warn about this kind of shit? It would save us a lot of trouble in the future.
@mutability Well done!
Is there a compiler switch to warn about this kind of shit? It would save us a lot of trouble in the future.
According to the Standard, we have UB here:
47) Using a bool value in ways described by this International Standard as "undefined,"
such as by examining the value of an uninitialized automatic object, might cause it to
behave as if it is neither true nor false.
Wuninitialized
/ Wmaybe-uninitialized
won't help here, and Weffc++
/ Qeffc++
spit out a ton of false positives (since they insist on using of initialization lists).
It still happens with current build 6020. How long does it take for the merge to land in http://dev.narc.ro/cataclysm/jenkins-latest/Windows/Tiles/ ?
6020 was the last one before the merge. Wait for 6021 http://ci.narc.ro/view/Cataclysm-DDA/job/Cataclysm-Matrix/6021/
Looks like it works, though windows curses and linux x64 tiles seems to have failed to build
Linux x64 tiles is turned off, not failed. Windows console is failing since 6000.
Sometime around build 6009, maybe a build or two earlier, I am getting crashes when saving, and it corrupts o.0.0