Closed andreymal closed 1 week ago
divVerent suggested applying a patch that adds additional checks and aborts the process immediately after something starts to go wrong, so there is another backtrace.
https://github.com/user-attachments/assets/3c82dc9a-8028-46df-b800-913a21a71fba
Removing the loop at
https://github.com/DarkPlacesEngine/DarkPlaces/blob/master/gl_rmain.c#L9586
will quite surely fix it - this loop reduces numdecals without reducing freedecal in case it steps above numdecals.
However, I don't see WHY this would happen anyway - the alpha of a decal should only ever be 0 or 1, and the only place setting it to 0 seems to be the memset above.
Can you try replacing the loop
while (numdecals > 0 && !decal[numdecals-1].color4f[0][3])
numdecals--;
by
while (numdecals > 0 && !decal[numdecals-1].color4f[0][3])
{
numdecals--;
if (decalsystem->freedecal > numdecals)
abort();
}
and if this hits, tell me:
(gdb) print numdecals
(gdb) print decalsystem->freedecal
(gdb) print decal[numdecals]
(gdb) print decal[numdecals].color4f[0][3]
Thanks!
gdb says "optimized out" even with -O0
:(
To all of them?
Otherwise, I could offer to rewrite this code without the freedecal
int, which IMHO doesn't even save performance anyway. Algorithmically simpler should be more likely to be correct.
Try this.
Replaces the current mark-and-cleanup approach of deleting decals by just immediately moving the last decal to the to be deleted spot.
I always knew printf
is the best debugger
(this ↓ is without the new patch yet)
decalsystem->freedecal > numdecals, aborting
ent->entitynumber = 4104
decalsystem->numdecals = 65
decalsystem->freedecal = 65
numdecals = 64
decal[64] = {
texcoord2f[0][0] = 0.585098,
texcoord2f[0][1] = 0.465171,
texcoord2f[1][0] = 0.534220,
texcoord2f[1][1] = 1.000000,
texcoord2f[2][0] = 0.585098,
texcoord2f[2][1] = 0.465171,
vertex3f[0][0] = 0.534220,
vertex3f[0][1] = 1.000000,
vertex3f[0][2] = 0.000000,
vertex3f[1][0] = 0.000000,
vertex3f[1][1] = 0.000000,
vertex3f[1][2] = 1.000000,
vertex3f[2][0] = 0.585098,
vertex3f[2][1] = 0.465171,
vertex3f[2][2] = 0.534220,
color4f[0][0] = 1.000000,
color4f[0][1] = 0.000000,
color4f[0][2] = 0.000000,
color4f[0][3] = 0.000000,
color4f[1][0] = 1.000000,
color4f[1][1] = 0.000000,
color4f[1][2] = 0.000000,
color4f[1][3] = 0.000000,
color4f[2][0] = 1.000000,
color4f[2][1] = 0.606560,
color4f[2][2] = 0.482234,
color4f[2][3] = 0.553815,
plane[0] = 1.000000,
plane[1] = 0.606560,
plane[2] = 0.482234,
plane[3] = 0.553815,
lived = 1.000000,
triangleindex = 1040697689,
surfaceindex = 1037559990,
decalsequence = 1039660353,
}
simpler-decals.diff
seems to work well, no crashes for ~6 hours (will test longer in the next few days)
Linux screaming BUG: unable to handle page fault for address: ffff9772752c8410
several times (I hope this is unrelated to Xonotic) prevented me from testing for longer than 20 hours straight, but simpler-decals.diff
doesn't crash (also played a few matches on feris and didn't notice anything unusual)
OK, I'll make that change then.
I don't quite like it, as I still don't understand why this happens, but at least the engine runs now I guess.
(I did find bugs in the old logic, yes, but none that look like they could make it crash...)
Also, these numbers:
triangleindex = 1040697689,
surfaceindex = 1037559990,
decalsequence = 1039660353,
look oddly high, although that might be expected on a map by this particular author.
decalsystem->freedecal = 65
numdecals = 64
That actually does violate an invariant - it may allocate the next decal at 65, while thinking there are only 64, so this may overflow the bounds of the decal buffer without triggering the reallocation logic in R_DecalSystem_SpawnTriangle. Not sure if that is what did happen, but 64 is a possible and likely value of decalsystem->maxdecals, skipped the allocation and wrote into something else's memory.
Not sure if that is what did happen, but it sounds like a possibility.
The numbers above could actually be color-like floating point numbers:
[rpolzer@srv04 darkplaces (git)-[divVerent/simpler-decals]-]$ perl -e 'printf "%.17g\n", unpack "f", pack "V", 1040697689'
0.13260401785373688
[rpolzer@srv04 darkplaces (git)-[divVerent/simpler-decals]-]$ perl -e 'printf "%.17g\n", unpack "f", pack "V", 1037559990'
0.10542432963848114
[rpolzer@srv04 darkplaces (git)-[divVerent/simpler-decals]-]$ perl -e 'printf "%.17g\n", unpack "f", pack "V", 1039660353'
0.12107325345277786
which speaks for this theory.
Please try the final version of the patch on that PR. It's the same but except for the Sys_Error in case a decal has zero alpha (as this seems indeed unnecessary, it was just the old code's way to mark a decal as expired, and even if it were to happen, that decal will expire soon anyway).
Also, these numbers:
triangleindex = 1040697689, surfaceindex = 1037559990, decalsequence = 1039660353,
look oddly high, although that might be expected on a map by this particular author.
I also caught one crash on Stormkeep, those numbers were similarly high
Thanks for reporting!
And - sadly - this issue would still have happened in Rust. But we'd have gotten a clearer runtime error and been able to fix it quicker.
Sometimes, very rarely, something strange happens in
R_DrawModelDecals_FadeEntity
andnumdecals
becomes negative, resulting in out-of-bounds.Unfortunately, I don't know the specific conditions under which this crash can be reproduced. It appears to be random.
I first encountered this crash on frankishtower15 in Xonotic instagib, so here's how I try to reproduce this crash:
maps/frankishtower15.bsp
cl_maxfps 0
(I don't know if it affects the crash, but I have 300-400 fps)Backtrace from https://github.com/DarkPlacesEngine/DarkPlaces/commit/caa1458b53e6a960e9c642dad6baebbdb500a722:
Some variables:
https://github.com/user-attachments/assets/0dd17528-0015-4b72-b1e8-0227235379c4