Open matthewearl opened 1 year ago
It's the same bug I reported earlier in Quakespasm repo, https://github.com/sezero/quakespasm/issues/64.
Thanks, I should have checked the Quakespasm repo too. I confirmed that https://github.com/sezero/quakespasm/pull/75 stops my repro from working on QS.
After porting your fix to ironwail, there's still a similar bug that occurs where two links without an intervening unlink occur. This time the entity is cleared in ED_Alloc:
__memset_evex_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:284
284 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
#0 __memset_evex_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:284
#1 0x000055555560be53 in ED_Alloc () at pr_edict.c:217
#2 0x000055555560363a in PF_Spawn () at pr_cmds.c:1061
#3 0x0000555555614675 in PR_ExecuteProgram (fnum=472) at pr_exec.c:656
#4 0x000055555561b45c in SV_RunThink (ent=0x555558e5e720) at sv_phys.c:143
#5 0x000055555561e1ae in SV_Physics_None (ent=0x555558e5e720) at sv_phys.c:983
#6 0x000055555561e9ce in SV_Physics () at sv_phys.c:1212
#7 0x00005555555f3f16 in Host_ServerFrame () at host.c:910
#8 0x00005555555f4df8 in _Host_Frame (time=0.00099927200062666088) at host.c:1186
#9 0x00005555555f5184 in Host_Frame (time=0.00099927200062666088) at host.c:1269
#10 0x00005555556270f6 in main (argc=7, argv=0x7fffffffdcf8) at main_sdl.c:210
Here's the commit with my port of the fix: https://github.com/matthewearl/quakespasm/commit/e71d48475ecf20fca9b55ed6078125e9aca828b7
The repro steps are the same as before. Seems to work best if you let the clock run to 5 seconds before resetting. I've been running with host_timescale 10
to make it faster.
What was the error? EXC_BAD_ACCESS
? What are the value of num_edicts
and e
then? The latter should point inside allocated block assigned to edicts
as it's size is max_edicts * pr_edict_size
while num_edicts
should be always less than max_edicts
.
Also, are you sure that added for
loop does ED_Free()
at least once when it crashes?
I think the problem is to do with how ironwail handles freeing objects. It appears that in normal operation num_edicts
doesn't ever decrease. Calling ED_Free
just adds the slot onto a free list to be used first when a new entity is allocated.
When the game loads the new loop kicks in, calls ED_Free
on entity n
, but then on the next line num_edicts
is decreased to be equal to n
. Subsequently ED_Alloc
is called, and first gets entity n
off the free list. Then ED_Alloc
gets called a further time, but the free list is empty, so edicts[num_edicts++] == edicts[n]
is returned. And so slot n
ends up being linked twice.
So I think the right fix is to not set num_edicts
in Host_Loadgame_f
if it would cause num_edicts
to decrease.
Again, what was the error in ED_Alloc()
? No matter how broken edict linking is, it shouldn't crash on that memset()
inside ED_Alloc()
.
The stack trace from before wasn't a crash, it's just the stack trace at the point where the prev/next pointers are cleared for the entity that gets linked twice without an intervening unlink. The game proceeds past this point but ultimately ends up in an infinite loop iterating one of the lists, as before.
Does the issue show itself with current quakespasm git master?
I can confirm that I can't reproduce the issue with https://github.com/sezero/quakespasm/commit/bd331e2192d27b803993150efeaf05d26051113c (ie. current quakespasm master)
That's great, thanks for testing!
So the additional problem you're describing at https://github.com/andrei-drexler/ironwail/issues/242#issuecomment-1678643170 is probably ironwail-specific.
Yes, I'm pretty sure it is ironwail specific. I naively ported the fix without realizing that num_edicts
shouldn't be decreased, but I think that's fine to do in Quakespasm.
Bug repro
map e1m6
restart
, the game will pause on the first frame with the console dropped.save t
to save a game at this point.x
toload t
.x
when clock times to 2 seconds.Repeat step 6 until game freezes. I normally hit the issue within about a minute or so.
The stack trace on freeze looks something like this:
Debugging
This diff prints out when entities are linked / unlinked from the area tree. When the bug is reproduced, some entity slots have two links without an intervening unlink, for example:
I think this comes about from a temp entity being allocated and linked in
SV_SpawnServer
(during one of its twoSV_Physics
steps), being wiped but not unlinked by the memset inHost_Loadgame_f
, and then the slot being reused as the loaded game progresses, without an unlink since the area next/prev pointers were wiped. One of the lists then becomes corrupt and the game ends up in an infinite loop when trying to iterate it.If my theory is right this could be seen with save games not made immediately after a restart, but I haven't had luck reproducing it this way.