bepu / bepuphysics2

Pure C# 3D real time physics simulation library, now with a higher version number.
Apache License 2.0
2.25k stars 261 forks source link

Assertion raised in AddUnsafely #312

Closed tweggen closed 3 months ago

tweggen commented 4 months ago

Hi, I ran into one problem again and again Debug.Assert((Flags[index >> shift] & (1ul << (index & mask))) == 0, "Cannot add if it's already present!"); triggered, many times like 10% of the time a dynamic object ran into a kinematic.

I reviewed a bit code, and while I certainly am unable to really trace most of the parts of the engine, I believe I might have found a problem in the code in Solver_Solve.cs. When changing this in a way what I believe might be a fix, my problem is not reproducable any more, even in the former error cases. It boils down to this change: If I understand correctly, the author sizes a new bitfield mergedConstrainedBodyHandles as the minimum of the existing batchReferencedHandles[0] and the highest possible claimed id. Then we copy the batchReferencedHandles[0] into the new buffer. What I would expect now is that we clear out the remaining part of mergedConstrainedBodyHandles. However, then we clear batchReferencedHandles[0], in a range that probably doesn't exist. That exactly is what later on triggered my Assertion: Close to the end of this method, we iterate through ConstrainedKinematicHandles, setting bits in mergedConstrainedBodyHandlers using AddUnsafely. In my context, AddUnsafely found areas in the bit field, that have all bits set, which end up to be the ones that I suggest to be cleared now. So this is my diff, I believe it fixes the root cause of a problem I stumbled upon. This diff was made to the head v2.4.0 branch that my game is using, the same code, however is in the main branch. I'm happy to provide more info. I know I might be completely wrong as I only scratched the surface of BepuPhysics2 impressive inner workings. Anyway, thank you for your work on Bepu! What an amazing engine!

--- a/BepuPhysics/Solver_Solve.cs
+++ b/BepuPhysics/Solver_Solve.cs
@@ -1133,7 +1133,7 @@ namespace BepuPhysics
                 pool.Take((bodies.HandlePool.HighestPossiblyClaimedId + 64) / 64, out mergedConstrainedBodyHandles.Flags);
                 var copyLength = Math.Min(mergedConstrainedBodyHandles.Flags.Length, batchReferencedHandles[0].Flags.Length);
                 batchReferencedHandles[0].Flags.CopyTo(0, mergedConstrainedBodyHandles.Flags, 0, copyLength);
-                batchReferencedHandles[0].Flags.Clear(copyLength, batchReferencedHandles[0].Flags.Length - copyLength);
+                mergedConstrainedBodyHandles.Flags.Clear(copyLength, mergedConstrainedBodyHandles.Flags.Length - copyLength);

                 //Yup, we're just leaving the first slot unallocated to avoid having to offset indices all over the place. Slight wonk, but not a big deal.
                 bodiesFirstObservedInBatches[0] = default;
RossNordby commented 4 months ago

Thanks for the kind words and report!

I can't seem to reproduce the assertion on the latest 2.5 beta, and I remain a bit confused about why that Clear is helping. In principle, the trailing region of the mergedConstrainedBodyHandles.Flags should be irrelevant (and indeed, if I fill it with trash data or set all bits, no assertion occurs under the latest version).

This change may be working around a different bug I vaguely remember fixing in one of the earlier 2.5 betas, but I don't remember the details.

Are you able to reproduce the failure in the latest versions?

tweggen commented 4 months ago

I'm afraid so. After a change to the master branch I'm running into the same issue. I'm developing this game engine / game: https://github.com/tweggen/Karawan When the player (dynamic body) hits a car (kinematic body), the kinematic's body handle is for the most part the only member in ConstrainedKinematicHandles, and quite frequently referencing the trailing region of mergedConstrainedBodyHandles.Flags .

tweggen commented 4 months ago

In these cases, the index is still hitting garbage which happens to be a set bit in the trailing area. e.g.:

image

image

tweggen commented 4 months ago

Although I'm technically on a Bepu fork right now, it does not contain any modification and I could directly replace it with the original.

RossNordby commented 4 months ago

Could you reproduce the failure in the demos for me to look at directly? Something wonky and contextual seems to be happening, and I'd like to make sure I actually understand the full scope of the problem so I can include it in future test suites.

tweggen commented 4 months ago

OK. I try first by creating a kind of physics protocol from my engine, so that I can dump all physics related things to an (xml?) log. Then I'll add a demo module to load and replay this. Hoping that it triggers the issue, or that I find an issue on my side while creating this.

tweggen commented 4 months ago

I have now successfully recorded all meaningful interaction with bepu, created a bepu demo that plays it back,exactly duplicating handle allocation but could not reproduce the assertion fail in the context of the demo, even in the 2.4 version. I'll look for mistakes or differences in the playback.

tweggen commented 4 months ago

Ross, finally I can reproduce it from within the demo environment on both 2.4 and current master (2.5 I guess). You can find the setup in my (otherwise unchanged) fork of bepuphysics2 in this branch:

https://github.com/tweggen/bepuphysics2/tree/twg_uincleared_merged_constrained_body_handles

There you can find two tags: [312_reproduce_on_2.4] (https://github.com/tweggen/bepuphysics2/releases/tag/312_reproduce_on_2.4) and 312_reproduce_on_2.5

What I did is: Added a demo called "ReplayJoyceDemo" (joyce being the name of my engine). The demo is number 25 in 2.4 GL Demo framework, number 28 in 2.5 GL Demo Framework.

Added a huge json file containing basically all physics action. I added this file to the Demos/Content directory , however I'm not loading this with the way intended, I'm just doing a FileRead with a relative path, which obviously is bad.

Added a subdir/namespace "actions" in Demos.Demos, containing the framework I used to capture the physics actions in my engine and also to replay it here in the Bepu context from the json file.

Small patches to the engine to make it more reproducable: in Solver_Solve.cs: I am just setting the entire buffer of mergedConstrainedBodyHandles to 1s, so that I do not rely on values of uninitialized memory.

In IndexSet.cs there is a small hook I added to throw an exception with the bodyhandle as opposed to just assert.failing. This made it easier for me to associate the error condition with data structures in my engine.

And, there was a small patch in the 2.4 Version in Window.cs, removing two Debug.Assert (just a impatient workaround) because my notebook failed to open the window, possibly due to a scaled display (please ignore this).

All in all, when you start my demo part, you should see the camera heading the scene from above, camera being locked to a dynamic object. And you see "cars", i.e. Collidables.Sphere "driving" until the dynamic mody finally "crashes" into a kinematic object car. At that point, right after the last "Simulation.Timestep" call from my log, the Assertion in AddUnsafely triggers.

My scene contains no static, just a lot of kinematics and two dynamics (which happened to be the player object and - in my game - a camera).

Note, and that may be a misconception/mistake on my side: I needed to have the NarrowPhaseCallbacks.AllowContactGeneration (for non-compounds) to always return true for the error condition to trigger.

Please let me know if you would prefer another way of having the patch submitted.

Sorry again for troubling you.

RossNordby commented 3 months ago

Thanks! I always appreciate efforts to reproduce problems like this; it makes my job a lot easier. I'll take a look as soon as I can—I'll be traveling a bit, so it might be a couple of days.

RossNordby commented 3 months ago

Perhaps with the benefit of a real desk and fresh eyes: sorry, I should have been able to see this sooner. Your "workaround" is a perfectly valid fix! The problem is that the existing clear (on batchReferencedHandles[0]'s trailing region) was supposed to be a clear on the mergedConstrainedBodyHandles's trailing region the whole time. batchReferencedHandles[0] shouldn't be mutated at all. Just so happens that batch 0 typically covers enough of the simulation that it's very rare that a constrained kinematic will land outside the valid range... unless you have a lot of kinematics.

I'll get a fix pushed tomorrow.

tweggen commented 3 months ago

Fixed, thank you!