Bevy parent transform is not used

rafalh commented 2 years ago

Currently bevy_rapier3d uses Transform component to read entity absolute transform. But this component is a transform relative to a parent in Bevy world. On the other hand GlobalTransform is an absolute transform for entity. Because of this if there is an entity with non-identity transform and as a children of it is another entity with a collider, collisions will not be detected in the real object location. For example when running following code ball collider will be placed at 0,0,0 instead of 0,10,0:

commands.spawn_bundle((
        Transform::from_xyz(0.0, 10.0, 0.0),
        GlobalTransform::default(),
    ))
        .with_children(|parent| {
            parent
                .spawn_bundle((
                    Transform::default(),
                    GlobalTransform::default(),
                ))
                .insert(RigidBody::Fixed)
                .insert(Collider::ball(1.0));
        });

sebcrozet commented 2 years ago

I agree that using the GlobalTransform is the correct approach. However, I’m not sure how to avoid introducing a 1-frame delay. Afaik, Bevy’s transform propagation system runs in the PostUpdate stage. So if we want to avoid a 1-frame delay when the user changes the transform, it means that all the bevy_rapier systems should run after PostUpdate, which doesn’t sound very idiomatic? This also means that the transforms modified by bevy_rapier itself will only be propagated to GlobalTransform at the next PostUpdate (because the physics stepping system will be run after the transform propagation of the current frame).

SUPERCILEX commented 2 years ago

Maybe I'm confused, but isn't that a non-issue? Currently, the relevant stages are physics -> update -> transform propagation. Users are expected to make changes in the update step which means physics won't see any changes until the next frame anyway. Using global transforms doesn't seem to change this in any way.

sebcrozet commented 2 years ago

Users are expected to make changes in the update step which means physics won't see any changes until the next frame anyway.

@SUPERCILEX Thank you for your input! I wasn’t aware of this convention.

What about the initialization of new colliders or rigid-bodies? Is is reasonable to require the user to do that after the physics stage too? Because, otherwise, if a collider/rigid-body component is inserted (with a tranform/global transform) before the physics step, they will have the wrong (un-propagated) transform during one physics step.

SUPERCILEX commented 2 years ago

If a collider/rigid-body component is inserted before the physics step, they will have the wrong (un-propagated) transform during one physics step.

That's true, but I think it can be solved though documentation: people can then set up their systems to run in the correct order relative to the physics stage. This is the correct long-term choice because stages won't be a thing soon meaning people will always have to explicitly decide when their systems should run relative to others.

A startup system should probably be added similar to this that runs after those transform propagations so that startup entities are positioned correctly initially.

Is is reasonable to require the user to do that after the physics stage too?

I think the physics stage should be moved to after transform propagation so that everything is in sync by the end of the frame. This is consistent with how transform propagation works and means people need to run systems that depend on physics before spawning those entities (and the spawning also happens before physics). Either way you have to carefully consider when systems are running. Thinking about my own game, this would be the ideal ordering:

Run game systems including those that depend on physics
Despawn + spawn entities
Sync transforms
Sync physics

This results in the following:

There is a one-frame delay to load the game (but this can easily be fixed by adding startup systems).
Systems that depend on physics and transforms see changes on the next frame. This is good. Currently, if a user clicks on a collider that has just been spawned, they'll be able to interact with something that hasn't even been drawn to the screen yet (because you can do spawn -> physics -> system that depends on physics leading to the rendering pipeline not having seen your spawned entity yet).

sebcrozet commented 2 years ago

@SUPERCILEX That makes sense.

I think the physics stage should be moved to after transform propagation so that everything is in sync by the end of the frame.

If we move the physics stage after transform propagation at the frame n, then the transform changes applied by physics at the frame n won’t be visible by the Update stage of frame n + 1 because the physics stage will modify transforms that will only be propagated at the post-update of frame n + 2. So these transform changes will only be be visible at the Update stage of frame n + 2. Though could be solved by having the physics plugin propagate all the transforms it modifies, though this would be slightly expensive if the user puts a deep hierarchy at descendants of a rigid-body.

SUPERCILEX commented 2 years ago

Are you thinking people will want to use the global transforms? The physics updates would still be visible to frame n + 1 via local transforms, but yeah the global ones will be out of sync.

sebcrozet commented 2 years ago

Yeah, I’m thinking about the global transforms.

maniwani commented 2 years ago

How do other physics engines handle the transform hierarchy? Do they model the hierarchical relationships as joints?

(Sorry for all the edits, updated to follow the suggestion on my next comment).

If Rapier copies from Bevy's GlobalTransform (see next comment) and the rigidbodies are in global-space, then I believe that means we'd also have to do another, global->local hierarchy propagation to update the Transform values to match.

It'd be awesome if we only have to propagate the transform hierarchy once, after all the changes have been made, since Bevy is currently trying to optimize those systems (bevyengine/bevy#4697), but if everything has to be in global-space for physics, then we're looking at something like:

fixed update
- user logic for sim-related movement
- "physics"
- propagate local transforms to global rigidbody positions
- step physics
- propagate global rigidbody positions back to local transforms (I believe this is a trivially parallel op)
update
- user logic for other stuff
post update
- (interpolate and) propagate local transforms to global transforms

The main takeaway is that you would be adding two new propagation systems that filter on entities with rigidbodies rather than move relative to the existing one that deals with all entities.

Any user systems related to movement should happen before the physics step, then those actions will all be accounted for in the same frame and everything will be synced. I think that's true whether or not you put these systems inside a fixed timestep.

maniwani commented 2 years ago

I'd actually suggest completely avoiding GlobalTransform here since there are many outstanding issues on what GlobalTransform should be internally (TRS, Affine3A matrix, etc.).

I think bevy_rapier should provide its own system that propagates the local Transform hierarchy to a global RigidbodyPosition component, along with another system after the physics step that calculates the new Transform values from the updated global RigidbodyPosition values.

SUPERCILEX commented 2 years ago

propagate local transforms to global rigidbody positions

So assuming this is easy (I thought you'd need global transforms), I agree with your ordering except for the fixed time step part. Why would we want physics to run on fixed intervals? It should take in a res time and apply position changes accordingly.

maniwani commented 2 years ago

Why would we want physics to run on fixed intervals?

So that part is optional and I was only using it as an example here, but in general it's better for game physics—and I mean simulation-related math in general, not just the rigidbody physics—to be fed a constant Δt for consistent and repeatable results, independent of framerate or even independent of GPU (headless app).

Because it still has finite-precision, floating-point math isn't associative, so for example, integrating velocity by 100ms once will give slightly different results than integrating velocity by 10ms ten times. (IIRC variable timesteps can lead to the kinds of physics glitches speedrunners like to use to launch themselves across a map.)

And then when you have a fixed timestep, to keep the visuals looking smooth, you interpolate entities between two physics steps while you build up to the next one.

SUPERCILEX commented 2 years ago

And then when you have a fixed timestep, to keep the visuals looking smooth, you interpolate entities between two physics steps while you build up to the next one.

Gotya, that was going to be my next question. Found this to clarify: https://gafferongames.com/post/fix_your_timestep/. I think using a fixed timestep should be a separate discussion from fixing the parent/child relationships since it'll require a lot of design.

Shfty commented 2 years ago

I've attempted a solution to this using custom system stages to transform parented physics bodies into global space before rapier's stages, then transform the ticked results back into local-space after it finishes writing back.

Here's a dump of my schedule graph for context:

The relevant custom stages are, ApplyGlobalTransforms and RestoreLocalTransforms - the other additions center around running rapier's schedule on a fixed tick and interpolating the results up to arbitrary framerates in the render world, so can be ignored for the purposes of this topic.

The apply / restore systems do a manual parent-child traversal of the transform hierarchy to establish a world-to-local transform that can be applied - or inverted and then applied for the restore step - when a RigidBody is encountered.

Initially, the results hold up well - a hierarchy of kinematic bodies can be moved in local space by gameplay systems and the simulation works. However, reading back the results like this introduces subtle precision errors that spike under certain circumstances, causing translation and rotation drift in transforms that should otherwise be unmodified in local space.

I'm uncertain as to the source of said error, having verified that it isn't being caused by any out-of-order system execution or hierarchy traversal. My best guess is some combination of natural floating-point inaccuracy being compounded by transform nesting, and certain rapier math types compiling down to glam SIMD structs that may prioritize speed over accuracy - I have little knowledge on that front, so perhaps someone with experience can weigh in.

Practically speaking, it becomes necessary to attempt error-correction such as resetting near-identity quaternions, or overwriting the affected transforms with proper local-space values using dedicated systems. At that point you may as well be working directly in world space and manually applying parent-child relationships based on your own specialized kinematic hierarchy, which is what I plan on doing for now.

Based on this, I believe rapier would have to implement its own parent-child transform representation - effectively working natively in local space - in order to produce stable output that can be fed back into bevy's transform hierarchy without error accumulation.

Shatur commented 2 years ago

Wondering how https://github.com/jcornaz/heron mitigating this issue. It doesn't have this problem.

sebcrozet commented 2 years ago

@Shatur Looks like heron simply runs a second transform propagation: https://github.com/jcornaz/heron/blob/main/rapier/src/lib.rs#L107 I think we should do something similar, at least as a stopgap solution until we get more feedback on the limitations of this approach.

I am not a big fan of the suggestions where Rapier implements its own transform hierarchy since that was one of the main pain point in previous versions of bevy_rapier.

Shatur commented 2 years ago

Looks like heron simply runs a second transform propagation:

@sebcrozet Hm... It's a nice solution actually. At least it will solve the issue. So we should run it before this system? https://github.com/dimforge/bevy_rapier/blob/3f04e8d6b9c4bacf54ce09a6cd358979329e4980/src/plugin/plugin.rs#L82

sebcrozet commented 2 years ago

@Shatur If we keep running our stages before CoreStage::Update, or if we change them to run after CoreStage::PostUpdate, then it should happen after systems::writeback_rigid_bodies.

If on the other hand we run our stages after CoreStage::Update but before CoreStage::PostUpdate, then our additional transform propagation should happen before init_async_colliders.

So... maybe we should just call it once before init_async_colliders and once after writeback_rigid_bodies, and let Bevy’s change detection avoid duplicate work?

Shatur commented 2 years ago

So... maybe we should just call it once before init_async_colliders and once after writeback_rigid_bodies, and let Bevy’s change detection avoid duplicate work?

@sebcrozet Hm... Is there any downside to run rapier stages after CoreStage::Update but before CoreStage::PostUpdate? To run transform_propagate_system additionally only once.

maniwani commented 2 years ago

After CoreStage::Update and before CoreStage::PostUpdate seems best to me.

I still think bevy_rapier should avoid messing with GlobalTransform and should instead have its propagation system write to RigidBodyPosition. Also the system after writeback_rigid_bodies wouldn't be the same propagation, it needs to update the Transform values (can be done in any order).

You should be able to use change detection to reduce work in both systems.

sebcrozet commented 2 years ago

@Shatur I don’t think there is a downside, no.

@maniwani

How do other physics engines handle the transform hierarchy? Do they model the hierarchical relationships as joints?

There is little-to-no transform hierarchy in physics engines. The only cases where you have a hierarchy is where:

You say that a collider has a fixed position relative to its parent rigid-body (so it’s a "hierarchy" with only one level).
You have a multibody relying on reduced coordinates to model joints. In this case you can have an arbitrary deep tree for the hierarchy, but relative positions are not encoded as transforms (they are encoded as joint-type-dependent generalized coordinates).

I still think bevy_rapier should avoid messing with GlobalTransform and should instead have its propagation system write to RigidBodyPosition.

Maybe, though that would be a more complicated change than just adding a call to bevy’s transform propagation. As you mentioned earlier, there is a bit of ongoing movement regarding the definition of GlobalTransform and optimization of transform propagation. So I would rather pick the simplest solution (calling transform_propagate_system) right now, and adapt later once decisions are taken on the Bevy side, rather than implementing a custom transformation logic (with custom components) that could end up obsolete/redundant depending on the GlobalTransform decisions.

Also the system after writeback_rigid_bodies wouldn't be the same propagation, it needs to update the Transform values (can be done in any order).

Yeah, the writeback_rigid_bodies simply updates the Transform. To do this, it needs to read the parent’s GlobalTransform to figure out the correct value for Transform. I don’t think we can really avoid reading GlobalTransform (which we need to be up-to-date at this point) here because we need to be sure that whatever we write in the Transform agrees with what will be rendered after the subsequent pre-render transform propagation.

maniwani commented 2 years ago

So I would rather pick the simplest solution (calling transform_propagate_system) right now, and adapt later once decisions are taken on the Bevy side, rather than implementing a custom transformation logic (with custom components) ...

I recommend against it because it IMO creates undesirable coupling with a render component. The definition thing is prominent but likewise headless apps may stop registering the global transforms component, and some users or plugins may be relying on (or want to rely on) global transforms being a frame stale until the moment of propagation in PostUpdate.

Earlier, what I meant was that you could duplicate Bevy's transform propagation logic and just replace the parts involving GlobalTransform with the component you have storing the rigidbody position.

Yeah, the writeback_rigid_bodies simply updates the Transform. To do this, it needs to read the parent’s GlobalTransform to figure out the correct value for Transform. I don’t think we can really avoid reading GlobalTransform (which we need to be up-to-date at this point) here because we need to be sure that whatever we write in the Transform agrees with what will be rendered after the subsequent pre-render transform propagation.

That's is only a concern if you run after CoreStage::PostUpdate, right? Otherwise, your system can use the new rigidbody positions (of each entity and its parent) directly to update the Transform values, and everything will be synced properly because the pre-render transform propagation happens after.

Shatur commented 2 years ago

@Shatur I don’t think there is a downside, no.

@sebcrozet then I would probably would go with this. It involves a bit more changes then just using two additional transform_propagate_system, but more fells right (and will require only one additional transform_propagate_system run).

I recommend against it because it IMO creates undesirable coupling with a render component.

But bevy_transform is a separate crate: https://github.com/bevyengine/bevy/tree/main/crates/bevy_transform So I don't see any reasons why it can't be used in headless apps. Also I personally still use bevy_render (I just disabling rendering device and window) even in headless app because it coupled with things like meshes which I use to generate collisions. So I would do a simple change now to fix this major issue.

maniwani commented 2 years ago

It's okay to use GlobalTransform. I'm just saying that Bevy could do anything to it since it's not there to be a source of truth (Transform is, so it's unlikely to change) and that adding another instance of the propagate system disrupts the documented behavior of GlobalTransform (edit: I should have said observable).

One line to add another instance of the propagate system is the simplest thing, sure, but it wouldn't take much more to copy its logic and tweak it to reference a bevy_rapier component instead. I think doing that could help avoid future headache, but using GlobalTransform would be fine for now.

sebcrozet commented 2 years ago

To me it looks like the docs says that the GlobalTransform is updated in the transform_propagate_system system, but it doesn’t states that the GlobalTransform must not be updated/modified elsewhere. If it mustn’t be modified otherwise, the docs should be more explicit about it. Something like "GlobalTransform is updated exclusively by [...]".

So, here is my though: we need a solution as soon as possible, and we know (thanks to heron) that calling transform_propagate_system is simple does work well with the current way Bevy works. So let’s make this modification. Best-case scenario, GlobalTransform never changes, and we benefit from any future transform_propagate_system parallelism performance improvements if there are any for "free".

We also know thanks to @maniwani that this may not be a perfect long-term solution (e.g. we are screwed if all its fields are merged into a single affine matrix), so we should keep our minds open to alternatives. I’m not sure I like adding another GlobalPhysicsTransform component used instead of GlobalTransform because that would be yet another transform component that the user needs to be added to the whole hierarchy. So ideally we should see if we could manage to avoid a new component (by keeping the current transform in local memory while recurring, and by making sure we don’t need to read that transform twice) but that would take time to design.

Shatur commented 2 years ago

So, here is my though: we need a solution as soon as possible, and we know (thanks to heron) that calling transform_propagate_system is simple does work well with the current way Bevy works. So let’s make this modification.

Agree with you! Are you planning to make this change yourself? Because I currently at work right now :)

sebcrozet commented 2 years ago

Right now I need to finish updating the JS bindings for Rapier (it’s been waiting to leave alpha for quite some time now), so I won’t get to that change today nor this week-end.

Shatur commented 2 years ago

Got it, then I will try send a PR ASAP myself.

Shatur commented 2 years ago

@sebcrozet done: https://github.com/dimforge/bevy_rapier/pull/177

Shatur commented 2 years ago

@sebcrozet thank you! Could you draft a new release?

sebcrozet commented 2 years ago

@Shatur I’ll probably make a release of Rapier first next week. I’ll make a new release of bevy_rapier after.

rafalh commented 2 years ago

Thanks for fast resolution! I really appreciate it!

dimforge / bevy_rapier

Bevy parent transform is not used #172