bepu / bepuphysics2

Pure C# 3D real time physics simulation library, now with a higher version number.
Apache License 2.0
2.34k stars 269 forks source link

High precision poses #13

Open RossNordby opened 6 years ago

RossNordby commented 6 years ago

While the vast majority of the engine works in relative space where single precision floating point numbers are good enough, there are two places where it can be problematic:

  1. Body/static world space poses, and
  2. Broad phase bounding boxes.

While it wouldn't be a trivial change, it is relatively localized. We could use conditional compilation to swap out singles for doubles or even fixed point representations without much issue in poses.

For medium size worlds, we could avoid changing the broad phase bounding box representation by simply being conservative and expanding the bounding box to the next FP32 value. For extreme cases (e.g. planet scale and up, relative to human sized objects), you would need to change the broad phase's representation. Changing the broad phase would be quite a bit more painful and would come with a measurable performance penalty.

This feature will likely have to wait a while given the cost/benefit ratio.

mcmonkey4eva commented 6 years ago

Is there a particular reason to default to 32-bit floats for things? Bearing in mind that the modern PC calculates in x86 floats, meaning double-precision (64-bit) floats are at minimum equally fast in all calculations to 32-bit floats.

RossNordby commented 6 years ago

While that's true for older scalar codegen, v2 uses SIMD widely. Bumping up to 64-bit as a default would cut ALU throughput in the most expensive stages by around a factor of 2. That wouldn't be a flat 50% performance cut, but it wouldn't be great.

More concerning would be memory bandwidth. If, for example, the solver defaulted to storing everything in full 64 bit precision, bandwidth would become such a bottleneck that you'd likely only be able to scale to about 2-3 cores on a system with dual channel memory, even considering the ALU performance cut.

High precision poses in isolation are far less nasty than just using FP64 across the board, of course. And if I just had to change the pose data layout and integrator, I'd probably be more aggressive about adding it. It's the broad phase involvement that makes things more questionable- tight memory layout and access patterns are critical to performance in the broad phase.

vpenades commented 6 years ago

In the meantime, maybe it could be good if you could ad a #if DEBUG only bounds check when creating objects in 3D coordinates that could be considered unsafe.

For example, if I create a solid cube in space coordinates 1000000000,20000000,3000000 or something like that, to get a debug only warning saying; "your're creating a scene too large for the engine to handle with precission" or something like that.

You probably know better than anyone else which bounds could be considered safe...

RossNordby commented 6 years ago

(Un)fortunately, there is no fixed threshold. It's completely dependent on the application's tolerance for error, and the scales that it chooses to work at.

For example, it would be totally fine for a simulation to have objects with sizes in the range 1000-100000, positioned at locations from -10000000 to 10000000... so long as the user doesn't care about accuracy below about 1 unit.

In other words, the limits are relative. The only absolutes are those defined by the single precision format, and so far users don't often try to make simulations on the scale of 3.4e38 :) (That said, I will probably go back and include some validation logic like v1 had for NaNs and infinities to capture these extreme cases. They almost exclusively occur in the case of divisions by zero or undefined memory access.)

Things change a little bit if I created a fixed point format. Those come with relatively low absolute limits compared to a floating point representation of the same size, so having some asserts to capture invalid state would be nice.

vpenades commented 6 years ago

I've always been a fan of fixed point for games development, it gives some sort of confidence that you're safe until the very last bit.

Unlike floating point, as you said, you never know exactly where the dangerous limit begins, so to play safe the range is probably even smaller than the numbers you proposed.

Assuming 1 unit is 1cm, it would give a range of +-100km , so let's forget about space exploration 😢

mcmonkey4eva commented 6 years ago

If scaled digits aren't available, or aren't sufficient for a simulation (excessively massive open space area that still needs micro-precision), the backup plan of localized physics regions appearing and disappearing according to what actually needs to be currently simulated (generally an area around the player) could suffice. It can be done perfectly in a singleplayer environment no problem, and less perfectly in a multiplayer environment (multiple live-simulation zones suddenly intersecting could be problematic... and can be fixed by autocombining zones, up until the point that 100 players chain the edges of their zones together and suddenly there's a zone that's way too large and the extreme edges are broken).

I was thinking about having static zones with border management in the past, but norbo indicated that it was excessive and that I could achieve a more effective solution for that setup by switching BEPUv1 to doubles. I have no idea how would I have properly achieved a multiplayer environment with static border zones that need to have safe functional crossing... but a dynamic setup from a spacey setup should be good bar the risk of multiple players conspiring to induce a glitch.

RenzoCoppola commented 6 years ago

I don't know how expensive is casting from double to float, but it's just the broadphase that requires the use of doubles. The narrowphase could be even halfs if these were available on modern cpus. About fixed points... I've done a big system with fixed point, and if it isn't compiled (.NET Native/IL2CPP/etc), then it will run horribly slow with .NET (4-16 times slower, compiled: just a bit slower).

RenzoCoppola commented 6 years ago

But, the broadphase could be done with fixedpoint, and it doesn't require a lot of complicated math (tweeking/debuging FP division/multiplication is a real pain). A 32bit integer could be a lot more efficient at storing positional data. (2^32) * 1mm = 4294 kilometers

vpenades commented 6 years ago

Using 64bit long integers, at 1/10 of a millimeter still would confortably fit the entire solar system.

@RenzoCoppola Indeed, having a fixed point pose would only reach the broad phase, since from that point onwards it should use relative, 32 floating point poses.

vpenades commented 6 years ago

@RossNordby Could it be possible for you to advance fixed point precission poses in the roadmap?

RossNordby commented 6 years ago

As far as scheduling this goes, here are my current thoughts in no particular order:

Given all that, I'm not sure when I'll get to it. Probably not before Q3 2018, but probably not as late as the roadmap currently implies (beyond Q2 2019).

In the interest of gathering information, here are some questions for anyone interested in this feature:

RenzoCoppola commented 6 years ago

I agree with the priorities. I don't think 10x plus is worth it either, but there're small cases for almost anything. Do you think it would be easy to expand the code to change the broadphase and "couplings"? I think this is the thing that changes the game for every case. What if all the objects are around the same size... a hash function would be a better broadphase... just another super particular case.

RossNordby commented 6 years ago

Swapping the broad phase for something completely different is possible, though a little annoying since I deliberately avoided adding relevant abstractions. I suspect the tree structure (particularly after the next quality pass) will be the best option across the board, especially given the requirement to accelerate things like ray queries.

Fortunately, it is likely that a hypothetical 64 bit tree implemented after the quality pass would end up faster than the current 32 bit implementation (which is still pretty fast). The main concern is development time and minimizing the amount of mess created by conditional compilation.

vpenades commented 6 years ago

Answering your questions:

To me the biggest issue is that typically, the physics engine drives the motion dynamics of the objects, which means that the physics engine pose already dictates the limits.

Having an extra layer using fixed point for poses means the top layer would need to completely re-implement the motion dynamics code of the physics engine, and translate between the fixed point pose and the engine pose, aplying some offset in the process, if there's a collision response, get back the pose from the engine to the fixed point pose, etc.

Think of a game like Elite Dangerous, EVE online and stuff like that, which would, indeed, require a 64bit fixed point value.

I don't think there's a problem creating islands for dense areas where you add/remove poses to be taken into account for colisions/responses, the problem is translating the pose from one unit system to another and back.

I don't know how configurable the broadphase is... maybe the solution would be to have two or three specialised broadphases, that cover most of the use cases.

Alternatively, how difficult is to completely skip the broad/narrow phase, and perform collision detection directly?

I mean, it is possible to have a method to pass a range of poses/shapes and get back what collided with what?

RossNordby commented 6 years ago

Alternatively, how difficult is to completely skip the broad/narrow phase, and perform collision detection directly?

The StreamingBatcher (expect a rename) does exactly this, but you will have to prune pairs before testing somehow- narrow phase collision detection is too expensive to brute force n^2 tests. Using the broad phase or directly using the underlying tree would be wise.

Also, allow me to narrow down my questions from before. How useful would you personally find each of the precision modes and their associated ranges? Which would you likely use for your own projects?

All of the following assume a 0.001 unit target precision.

  1. 32 bit float: 16384 units
  2. 32 bit fixed point pose, but 32 bit floating point broad phase rounded up to conservative ~0.01: 81920 units
  3. 32 bit fixed point: 2097152 units
  4. 64 bit float: 8.79e12 units (if a unit is a meter, ~8 light hours, roughly maximum orbit distance of pluto)
  5. 64 bit fixed point: 9e15 units (if a unit is a meter, a little under a light year)
  6. Even more extreme
vpenades commented 6 years ago

Which would you likely use for your own projects?

To me it would be these two:

  1. 32 bit fixed point: 2097152 units
  2. 64 bit fixed point: 9e15 units (if a unit is a meter, a little under a light year)

Specially, option 5 would be ideal.

The rationale is that I would like to be sure the simulation is deterministic regardless of the location where the simulation is happening; Given a simulation scenario, with floating point units you would probably don't get the same results if the simulation happens 10000 units translated away from the origin.

In terms of precission, it's true that float or even doubles are more than enough to accurately pinpoint a point in fairly large space.... but when dealing with collisions, bounces, etc, the precission difference is cumulative, so fairly small precission differences can result in the initial and end states of a simulation to be radically different, depending on where in space did you perform the simulation.

Alan-FGR commented 5 years ago

The problem with fixed point is that you normally have to be able to represent very large and very small values for a physics engine, so general purpose fixed points will be a limiting factor, maybe for positions they would be fine but not for everything. That being said, physics engines (most notably 2D ) have been ported to hardware that don't support floats at all, but I'm not sure whether they had to implement some kind of software float for some situations. Physics isn't just positions, there's collision detection which deals with penetrations, possibly very small, and then these are used to calculate impulses which in turn affect velocities and such. All of that being said it sure would be cool to not have to care about all of that or workaround float point limitations (like origin shifting), and be able to do large words naively just dealing with global absolute coordinates. But then again even if that's possible it's at very least not memory/space efficient.

vpenades commented 5 years ago

Maybe a solution would be to have a dual fixed-floating pose representation, something like this:

Struct BigPose
{
    Vector3 Position;
    Integer3 Offset;
}

This structure would work like this:

In 'default' mode, the offset would represent the integer component, and the position would represent the fractional part.

But, when operating with two poses, one of the poses would be "shifted" so both offsets would match. Then you could do maths only with the position part. At the end of the operations, all offsets would be shifted back.

The engine could even detect islands of objects and adjust the offset of all of the objects to use the same offset, so the broad and narrow phase could use only the position (all calculations would be done as if the offset is the origin)

Thoughts?

Alan-FGR commented 5 years ago

Well, that sounds a lot like origin shifting except it's a global offset. I think in the end it really comes down to whether you need to simulate a large area (with fine precision ofc)... of the physics engines I know only a single one (Newton) allows that, all the other ones, including the ones used in most AAA games (Bullet, PhysX, Havok, and the obsolete ones like ODE) only support single precision but that doesn't prevent you from having large and potentially infinite worlds if you don't need to simulate a large area at once. PhysX and Havok provide you tools to manage that, iirc Havok even has some doubles API so in practice you can work with doubles but internally it's origin shifted floats. These solutions work for most games because they only simulate a small area around the player. Please note that floats are reasonably precise (for a FPS scale game at 1unit=1m) up to a few kilometers from origin, but most open world games, for example say Bethesda's simulate only a much smaller area (few hundred meters at most). Double precision certainly would be nice to have, but it's a bit of an edge case. They're only really necessary for space games in which you need fine precision too (say FPS gameplay inside ships). For other popular use cases single precision is better in absolutely all technical aspects.

mcmonkey4eva commented 5 years ago

In my own use case, any sub-islands type of solution would work - multiplayer open world setup. Any player only needs to simulate the area around them, but the server needs to simulate everything everywhere. So if player 1 is 5 kilometers west of the origin, and player 2 is 10 kilometers east, that'd be a tough situation for single precision normally, but double precision can handle it (I'm currently using BEPUv1 modded to double precision) and separating the various areas into islands would suffice as well (though there's some worry about the odd case of 50 or 100 or whatever players standing as far apart as they can while still sharing a simulation island, causing the farthest edges of the mega-island to have trouble ... clearly a weird edge case, but I like thinking about every possible case).

Alan-FGR commented 5 years ago

Yes, at 10km from origin you start getting problems with single precision. For server-based physics unless you can design the game world with such limitations in mind (say split into different geographically disconnected areas of about ~100km², and simulate those in independent physics worlds), double precision really is the most reasonable solution, especially because you have control of the hardware. Local simulation could be a problem though, in my tests unless you bump up the simulation quality (like iterations) objects will get out of sync rather quickly if you're comparing two areas with different origin shifts. I can only imagine local simulation using floats and server using doubles being much worse than that.

Alan-FGR commented 5 years ago

Also, another thing to consider is that although rendering artifacts tend to manifest farther from origin, if you're using doubles internally you'll have to shift the origin of the objects you're rendering too since only modern and workstation GPUs work with doubles natively, and they're probably not fast too, not only in terms of data size... not to mention doubles are unnecessary if you just render everything relative to the camera anyway.

vpenades commented 5 years ago

My concern is also determinism. With current floating point approaches, it is odd to me that a given simulation would not render the same results if the initial setup is displaced several units from the origin in one direction.

In fact, if would be in intereting test case; to see at which distance from the origin a given simulation begins giving a completely different result.

Theoretically, a fixed point solution would not have such problems.

mcmonkey4eva commented 5 years ago

Mathematical synchronicity is unreasonable to expect ever (I'm terrified about whatever vpenades is trying to accomplish) - server/client sync everything that needs syncing via network. Also have two different physics engines is silly, so of course both client and server in my case are using the double precision setup. Local area sub-simulations is fine - I could even do it outside-of-engine by generating multiple Spaces and regenerating into a new one if they get close (though that would be pretty slow, so an actual solution to doing that would be done in the physics engine source).

Also the rendering I have in what I'm doing is of course origin-shifted at runtime. It centers the rendering-world origin at the camera viewpoint and renders everything relative to that with floating point precision (I have it apply the offset in the deepest point of the CPU-side rendering call tree, just before passing locations into the GPU).

In my own setup everything is tested and functional, including at massive distances. The question (for me) is what are the options in BEPUv2 for this? Expanding to doubles would fully solve the issue (as it already solved the issue for me in BEPUv1).

@vpenades I don't think there's any set distance of offset in which a simulation loses accuracy - it may as well be treated as pure randomness. Floats do weird things at the lowest ends of their precision, and in some cases those microscopic inconsistencies can bubble up, especially if you give it some time (and simulation tick cycles).

Alan-FGR commented 5 years ago

@mcmonkey4eva The overall precision is indeed roughly the same, you lose mantissa bits at the same proportion you gain exponents, so overall the ratio of smallest increment to the magnitude of your position stays roughly the same (for practical purposes). That doesn't mean that for the fixed scale you're using there's not a 'sweet range' in which physics simulation will be more acceptable.

@vpenades A simulation will never yield the same results if the objects aren't at the same position, say for example you have a stack of boxes at 0,0,0 and another similar stack at 100,100,100, the simulation will inevitable yield different results. The only way to prevent that is fixed precision as you pointed out. Now, determinism is the sense of reproduceability is certainly achievable, as long as you tick the engine with the same values and insert objects in the same order, most will yield the exact same simulation in the same hardware. But when you're using floats the hardware becomes a problem though, because ieee754 (which is the de facto standard) only standardizes the way floats are stored, not the results of operations using those floats, which could be different depending on the hardware, even if they're ieee754 compliant. In practice that's not observed, but it could be in the future. So it's simply not something you can rely on. There's also nothing saying that all hw should be ieee754 compliant, but that would certainly break a lot of code depending on the implementation... especially because popular languages use floats under the hood, so I can only imagine the consequences of a float implementation that doesn't store say discrete integers at the same range.

Bartolomeus-649 commented 4 years ago

There should be a feature where the simulation engine could fork out, and hand of simulation of objects close together to a child simulation engine. Each child should "own" its own space. The overall coordinate system should be based on BigInteger, which allows for arbitrary large numbers. The space owned by an individual simulation-engine should fit in whatever the optimal data type is for simulation calculations.

This way you could scale out the simulation forever, and even distribute it to run on several machines. And since we are talking physics simulation here, objects has a tendency to only impact other objects that are close by. But of course there will be "border" cases that needs to be handled and managed.

RossNordby commented 4 years ago

Splitting simulations is indeed the current recommendation to deal with enormous worlds. Individual simulations operating in FP32 while having a meta-origin stored in arbitrary precision does work.

I have no plans to include auto-distribution into the core library at this time, though. There is usually a huge amount of application-level logic that needs to tie into that process, especially when talking about serverside distribution. It would end up looking a lot more like general infrastructure than physics.

That said, it's not impossible that I end up open sourcing that kind of infrastructure as a separate project later. 'Bout 0% chance this year, though :)

damian-666 commented 4 years ago

I think I agree it could be done at application levels but ode did have a sort of way that you could collide two spaces. Btw Is there any chance a 2d engine could be built out of this maybe with a preprocessing define..? setting z stuff to zero isnt great no good it's 30Percrent wasted space, tons of sparse matrices and extra complexity.

Another feature I think is important that relates to this comment i think,, is collision of separate joint graphs, ie creatures. The. Aabb around the entire creature or system of joints that are interacting not by field but but physical connection. This speeds things greatly especially you test only the aabb around the whole system use the center , nexus, main body, trunk or a root object, not leaves, as an handle .

On Mon., Mar. 23, 2020, 11:53 a.m. Ross Nordby notifications@github.com wrote:

Splitting simulations is indeed the current recommendation to deal with enormous worlds. Individual simulations operating in FP32 while having a meta-origin stored in arbitrary precision does work.

I have no plans to include auto-distribution into the core library at this time, though. There is usually a huge amount of application-level logic that needs to tie into that process, especially when talking about serverside distribution. It would end up looking a lot more like general infrastructure than physics.

That said, it's not impossible that I end up open sourcing that kind of infrastructure as a separate project later. 'Bout 0% chance this year, though :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bepu/bepuphysics2/issues/13#issuecomment-602790931, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD74XGNHADZOZ6PDHJ2ISETRI6VZXANCNFSM4EAHIUSA .

RossNordby commented 4 years ago

Is there any chance a 2d engine could be built out of this maybe with a preprocessing define..?

It's technically possible, but realistically, no. It's definitely not in my own plans.

Another feature I think is important that relates to this comment i think,, is collision of separate joint graphs, ie creatures. The. Aabb around the entire creature or system of joints that are interacting not by field but but physical connection.

The broad phase already performs SAH-guided grouping that has the same effect (without needing to take into account constraint graphs) and is very, very fast.

michaelsakharov commented 2 years ago

Oh, didn't expect GitHub to link to this :\ Sorry about that. That aside, Any update on this?

RossNordby commented 2 years ago

2.4's revamp included some shifting data around for the explicit purpose of making a bump to double precision positions minimal cost. Implementing this would likely take less than a week to get something functional, maybe some more to squeeze every drop of performance out of the fatter BVH.

I do intend to implement it at some point. It's one of the main features I'm looking at for the upcoming 2.5 and I anticipate it being useful for one of my internal projects in the not crazy distant future. I'd guess it won't be implemented any earlier than a month from now (unless someone gives me a bunch of money :P), but sooner than a year.

vpenades commented 2 years ago

Maybe it could be possible to use the new math operators on interfaces, so we can have some sort of generic poses?

I think that would be a better solution compared to conditional compilation.

RossNordby commented 2 years ago

There is indeed some code that could be reused rather than specialized with generic math-esque approaches, but that's actually a fairly tiny chunk of the complexity around higher precisions. Unfortunately, static abstracts/generic math don't address the prickliest API issues.

It's very useful to have direct access to the data behind bodies or in the broad phase. Having this access requires that you know its type. Imagine something like Simulation.Bodies[bodyHandle].Pose.Position- how does it know what precision you want, or what precision is in the buffer? All of the options are varying levels of gross without nailing down the type information at some external scope.

One option I'm considering is having a Simulation and Simulation64 (or whatever name). Basically identical usage, except Simulation64 would have the type information to provide effortless access without weird generic infestation. Likely just source generated.

This still doesn't solve the issue of things like the OneBodyLinearServo. Might end up just having to create a OneBodyLinearServo64, and attempting to use the wrong one results in a runtime error or performance loss from up/downcasting.

Conditional compilation makes all of this trivial since there's only one thing it can be, at the cost of making it much more annoying to set up both a 32 bit and 64 bit simulation in the same project.

vpenades commented 2 years ago

Personally, I would not mind having different flavours of BepuPhysics, one flavour per pose type, since it´s unlikely that a project needs more than one simulation at a time.

Regarding the generic poses, I was thinking about the current works being done at dotnet to introduce double sized vectors.

Funny thing is, they initially considered simply adding the double sized vectors as plain types, but at some point they decided to move to generic types, and somehow, they would support intrinsics for these generic types.

I don't know which is the state of these works, but I would suggest to give a look at it, relevant entry here.