Shpoike / Quakespasm

Extra bloaty junk to modernise stuff a bit.
http://triptohell.info/moodles/qss/
GNU General Public License v2.0
184 stars 41 forks source link

Player can get stuck against door in E3M2 #50

Open Macil opened 2 years ago

Macil commented 2 years ago

In E3M2, there's a spot where the player can get stuck just by running forward on a platform trying to get by a door/wall. This bug happens very easily by accident on a normal playthrough, as some players will naturally try to get around the wall by walking here.

This doesn't happen in WinQuake, Quakespasm, or the 2021 rerelease. It does happen in QSS and vkQuake.

vkquake0000 vkquake0001 vkquake0002

The player gets stuck between the blue pillar in the first picture and the wall on the right while standing on the raised floor. (The gap on the left doesn't seem to be enterable; is the map asymmetrical, or is the bug asymmetrical?) It seems like the presence of a raised floor causing the player to step up is somehow allowing the player through a thin gap that they normally can't go through, and then they can't exit that spot. (Maybe QSS has the collision changed in a way that makes it process checking for stepping-up before checking if the player fits through a gap?)

o4zloiroman commented 2 years ago

Had the same issue.

Macil commented 1 year ago

Investigating this a little further:

The player steps up through the gap in SV_TryUnstick() when i = 4. This case isn't hit often outside of this, so to debug things I created a global variable that's set to true when i is set to 4 and unset the global variable after, and then in various functions called by SV_TryUnstick() I check this variable in an if-block, log some things, and I put breakpoints on those logs while using a debugger.

Inside SV_TryUnstick -> SV_PushEntity -> SV_Move -> SV_ClipMoveToEntity, after SV_RecursiveHullCheck (hull, start_l, end_l, &trace, hitcontents);, in QSS trace.fraction = 1, and in QuakeSpasm trace.fraction = 0.

Stepping through both QSS's Q1BSP_RecursiveHullTrace and QuakeSpasm's original SV_RecursiveHullCheck, I find they're in sync until they reach a point where t1 = -0.03125 (-DIST_EPSILON) and t2 = 1.96875 (2-DIST_EPSILON) (world.c#L731). Execution ends up going to the return at world.c#L743, with rht = 1 (rht_empty) and trace.fraction = 1.

It seems like QSS's Q1BSP_RecursiveHullTrace is consistent with FTEQW, so it mystified me why FTEQW didn't also have the bug until I noticed something: FTEQW never calls SV_TryUnstick. It's commented out at FTEQW's sv_phys.c#L2047.

Is the difference in behavior because QSS / FTEQW's SV_RecursiveHullCheck is actually more accurate and is merely revealing an issue in the map and/or SV_TryUnstick, or is the difference because QSS / FTEQW's SV_RecursiveHullCheck is flawed? A point against the function's correctness is the fact that the map has 4 identical 32x32 gaps just like the problem spot all around the pillar (I checked the coordinates in TrenchBroom), but only one of them can be entered. QuakeSpasm's original SV_RecursiveHullCheck handles these exactly symmetrical cases the same way.

It's interesting that the split between QuakeSpasm and QSS happened right where t1 had the value of exactly -DIST_EPSILON, and that QuakeSpasm's original SV_RecursiveHullCheck has logic involving DIST_EPSILON like world.c#L665 earlier than QSS's Q1BSP_RecursiveHullTrace does. (In QuakeSpasm, when t1 = -DIST_EPSILON, logic involving DIST_EPSILON is used before recursing again. In QSS, it recurses further before doing any logic involving DIST_EPSILON.)

I'm pretty stuck at this point in investigating it as I don't have much understanding of SV_RecursiveHullCheck.

Shpoike commented 1 year ago

the original SV_RecursiveHullCheck has an issue in that it accumulates imprecision (t1+t2 were determined from the previous level's epsiloned t1+t2). the deeper the tree gets, the less precise the collision gets. Quite often it walks down the wrong side of the tree and you end up getting blocked by empty space, often on the floor, resulting in the player needing to jump or so to get past the broken area. you might not notice it on flat ground, but lumpy bumpy ground can often be awkward to scramble over due to this loss of precision. the replacement logic keeps its midpoints closer to the actual impact point., meaning the next level's midpoint is computed more accurately too. its better, but its not perfect - it is still using floats, and floats suck, so it'll never be perfect. you can use doubles to get noticeably more precision (especially for divisors), but they're still imprecise. another issue that doesn't help matters is imprecision in the plane's normal meaning it's not quite normalised - perfect precision is impossible and anything referencing an epsilon is an admission of guilt in that respect. :) which brings up an interesting question - why's it using regular 'DotProduct' in one places, and the QS-esque 'DoublePrecisionDotProduct' macro in two others? the (t1-t2) divisor can be quite sensitive to precision loss, it might help.

what the replacement isn't so good at is replicating the vanilla behaviour when reporting startsolid. the endpos that gets reported is where it next impacts an solid surface rather than some weird arbitrary hard-to-replicate position, and this breaks eg the lavamen in rogue (for the most part they show up on the floor below). so its not perfect, this is why I restored the original when pr_checkextension is 0 (using that cvar like that kinda sucks, but cvar abuse is a different issue).

'FTEQW never calls SV_TryUnstick' reading it, it seems pretty hacky... is that trying to work around the various precision problems I mentioned earlier? SV_CheckStuck is the function that's meant to move you out of walls (when the precision goes to poop), and IS called in both engines. if you're already inside a wall then that SV_TryUnstick will blindly try to SV_PushEntity through into a new area until it can get past allsolid(without checking just startsolid). and because you're NOW not in a solid, SV_CheckStuck can't do its thing to pull you back to somewhere sane.

to be clear, precision is a pain in EVERY engine that uses floats/doubles for collisions. most cheat slightly by eg nudging impact points to 1/32th away from the actual plane (at least q2+q3 explicitly do this, with their 'fraction' and 'truefraction' values), with 'onseam' collisions reporting a point outside the brush instead of getting too close, which also avoids pointentities getting into the infinitely small cracks between brushes. unfortunately quake's bsp trees ARE the collision instead of merely an optimisation, so that sort of trick would just leave it in some other solid leaf. ultimately imprecision WILL happen, and trying to hack your way past it is just going to leave you stuck in some gap between brushes that you then can't escape from. stoopid float imprecision.

so if the fix is to just not call SV_TryUnstick when using the new version of the function then I'd be inclined to call it bug fixed.

damn essays.

Macil commented 1 year ago

Thanks for the explanation! The differences to QuakeSpasm's code feel less mysterious now.

I understand what you mean about being blocked by empty space. I remember running into mysterious ghost collisions like that in the last few weeks (usually upon reaching the top of a slope where it transitions to a flat surface), I think only in the 2021 rerelease but maybe in vkQuake too. I wish I remembered if and where any might have been in QSS/vkQuake so I could test any changes done here on those cases.

if you're already inside a wall then that SV_TryUnstick will blindly try to SV_PushEntity through into a new area until it can get past allsolid(without checking just startsolid). and because you're NOW not in a solid, SV_CheckStuck can't do its thing to pull you back to somewhere sane.

I think SV_TryUnstick isn't about getting a trapped immobile player outside of a wall like SV_CheckStuck, but about helping them smoothly get up ledges/slopes when they've suddenly lost all momentum because there's a minor or ghost obstacle barely blocking them that they could possibly sidestep past. It only attempts some locations within a few units of the player's coordinate and fully undoes its work if it can't find a clear spot. When it fires during this issue, the player isn't currently stuck inside a wall, but just blocked from passing ahead through a gap by the pillar door and a wall to their left and right. I think you might be right though that it could just exist to work around some other collision imprecision issue that might not be relevant now. If it was some intentional game design feature for general sidestep-around-a-wall logic, it probably wouldn't exist as a special-case for while you're going up a ledge like this.

I'm tempted to take a shot at going over Q1BSP_RecursiveHullTrace more deeply in the next few days and seeing if some minor precision improvements sprinkled in fix this specific issue and potentially others, but disabling SV_TryUnstick seems like a good option otherwise especially if FTEQW has been getting away without it. (I also imagined changing SV_TryUnstick to pass some flags or call other functions so that it uses the original SV_RecursiveHullCheck internally, but if no one is really sure what case SV_TryUnstick is meant to address and no obvious clean improvement can be made to Q1BSP_RecursiveHullTrace to make it work well with SV_TryUnstick here, then it's probably overkill to try to special-case things for SV_TryUnstick instead of just axing it.)

Macil commented 1 year ago

I realized that another avenue of debugging this: instead of comparing the execution of Q1BSP_RecursiveHullTrace to that of the original SV_RecursiveHullCheck function, figure out why Q1BSP_RecursiveHullTrace acts asymmetrically and only allows one of the four gaps around the pillar door to be entered. If we could figure out why it doesn't work symmetrically, then assuming it's not a nebulous precision issue then there could be an easy clean fix to this issue that makes it symmetrically block all of the gaps.

I added some logging to Q1BSP_RecursiveHullTrace showing variables in it over time during calls to SV_TryUnstick, and indented the logs further whenever it recurses so that the separate calls to it can be tracked easily. (I've attached the diff with the logging code.)

Then I've extracted the logs of when I pass into the gap on the right of the pillar vs when I get blocked trying to go into the gap on the left of the pillar: right-pass.txt, left-blocked.txt. Then I looked at both of the logs for where the symmetry of the situations breaks down.

In each of the logs, there are two calls to SV_RecursiveHullCheck: one call in SV_TryUnstick -> SV_PushEntity -> SV_Move -> SV_ClipMoveToEntity, which returns 1 (rht_empty) in both cases, and then one call in SV_TryUnstick -> SV_PushEntity -> SV_Move -> SV_ClipToLinks -> (SV_ClipToLinks recurses several times) -> SV_ClipMoveToEntity, which diverges. I think this second call to SV_RecursiveHullCheck is when the movement of the player is checked against the pillar door.

The logs diverge around line 135. In right-pass.txt:

  reenter:
  num: 2307
  t1: 0.000000
  t2: 1.968750
  reenter:
  num: -1
  returning: 1 (rht_empty)
 returning: 1
 === i = 4, afterpush
ent->v.origin = -334.031250 -110.031250 250.031250
unstuck! i = 4
 == Ending SV_TryUnstick == 

In left-blocked.txt:

  reenter:
  num: 2308
  t1: 0.000000
  t2: -1.968750
  Q1BSP_RecursiveHullTrace midpoint
  t1: 0.031250
  t2: -1.968750
  Q1BSP_RecursiveHullTrace:
   p1f: 0.015625, p2f: 0.015625
   p1: -432.000000 -112.000000 250.031250
   p2: -432.000000 -112.000000 250.031250
   reenter:
   num: -2
   returning: 0 (rht_solid)
  returning: 0
 returning: 2 (rht_impact)
 === i = 5, afterpush
ent->v.origin = -431.968750 -112.031250 250.031250
 == Ending SV_TryUnstick == 

The two if-blocks at world.c#L707 look responsible for this. The code appears to be meant to recurse into a child node if t1 and t2 are both on the same side of zero, but if one of the numbers is zero, it only considers the other number on the same side of zero if it's positive. So when we try to go around the pillar to the right and get (t1=0, t2=1.968750), the code recurses into a child node, but when we try to go around the pillar to the left and get (t1=0, t2=-1.968750), the code doesn't recurse into a child node and does something else despite it being an exactly symmetrical situation.

(Interestingly, in the original SV_RecursiveHullCheck function, there's old inactive alternative code that does treat these situations symmetrically, which I think supports my intuition that these could be made symmetrical. (It also does something with DIST_EPSILON, but I think that's just a consequence of their more DIST_EPSILON-centric implementation and isn't something I'm recommending here.) I think they may have disabled this code and switched to a simpler implementation where they accidentally killed the symmetry because the way their code approximates values with DIST_EPSILON made it much less likely for 0 to be exactly encountered, but with QS's more accurate checks, this edge case actually does come up now.)

If I change the two if-blocks at world.c#L707 to both use >=/<= to permit zero, then I'm able to get into all four gaps around the pillar door. If I change the two if-blocks to both use >/< to deny zero, then I'm correctly blocked from all four gaps around the pillar door.

Using >=/<= for both looks like it might have been what the code intended to do, though using that (while SV_TryUnstick is still present) just gets us the wrong behavior more consistently (all four symmetrical gaps being enterable now). Maybe that change is the logic of consistently letting the player get through exactly sized gaps, and using >/< is the logic of not letting a player through an exactly sized gap? If it's just that simple, then changing world.c#L707 to use > instead of >= looks like a clean solution to this issue.

(I considered the alternative of using >=/<= to permit zero in both if-blocks, and then working around the player getting in the gaps by disabling SV_TryUnstick, but it has the downside of negating whatever benefit SV_TryUnstick was intended for if it still has one, and I think that would still allow the player through the gap if they hit it perfectly diagonally like what SV_TryUnstick happens to simulate in this case.)

Macil commented 1 year ago

Turns out my suggestion of changing the >= to > is pretty flawed: with this change, I noticed in Arcane Dimensions the smoke sprites that are supposed to spawn where your shotgun projectiles land now usually spawn right on the player instead. I think I'm giving up on anything clever now, and I agree that just axing SV_TryUnstick while pr_checkextension is active is a decent solution, especially since FTEQW has been getting away without it.

(Maybe the > in the second if-block should become >= to make things symmetrical? Dunno, maybe it just shouldn't be touched if there's no concretely known issues that would address unless that idea resonates with someone who knows the code much more deeply than me.)


Earlier today, I joined a coop match (in the 2021 rerelease) with two seemingly inexperienced players. Coincidentally, we just arrived to E3M2, and when we got to this pillar door, both of them immediately tried to enter both of the gaps on the left and right of the pillar. If they had been in QSS instead, they both would have gotten stuck. Just mentioning because I thought it was funnily coincidental but also it signals a little how common this issue might be for people.