RandyGaul / cute_headers

Collection of cross-platform one-file C/C++ libraries with no dependencies, primarily used for games
4.21k stars 264 forks source link

Supporting scale transforms #334

Closed KTRosenberg closed 1 year ago

KTRosenberg commented 1 year ago

I’m currently trying to see if I could migrate to using your collision library and am looking for advice on how to modify it (if possible).

In short, by objects use transformation matrices, and specifically I use non-uniform scaling in addition to the position and rotation that you support. I think the solution is a matter of changing the transform/vector multiplication function, but I’m not entirely sure.

Would you be able tk advise on how to make the minimum changes? Being able to use a regular matrix would be great as a starting point, even if that is more costly in terms of memory.

RandyGaul commented 1 year ago

What's your use case here? It's likely if I can understand more about what you're trying to do it would come to light that non-uniform scaling may not actually be a useful thing to do. Nonetheless, here's some thoughts about it.

This wouldn't be too difficult to do. I would upgrade the transform to a 3x2 matrix, composed of a 2x2 matrix and position vector. The 2x2 matrix can hold sin + cos along with scale values for x and y axes. You can use the det2 function and perform a 2x2 matrix inverse needed to scale normal vectors on the surface of each shape.

Cost of memory isn't really a concern, neither is perf for that matter. In 2D it's simple to recalculate normals upon each transformation. You'll have a loop of sqrt functions, but it's still not that big of a deal in terms of perf required.

However, when transitioning to 3D this isn't really a great option, as the number of faces grows quite a lot and the perf hit becomes more significant to recalculate face planes, which isn't nearly as trivial as the 2D case, especially when considering numeric robustness.

An API problem pops up once you try to introduce non-uniform scaling, you might think it's a good idea to change the size of shapes at run-time, which will lead to all kinds of complicated broadphase and narrow phase problems.

In the end, you may realize for 2D it's simpler to pre-transform your vertices once using non-uniform scaling, and then submit the transformed vertices for collision detection by then on assuming uniform scaling.

KTRosenberg commented 1 year ago

Thanks for the reply. I really appreciate it.

For a little context, I'm using a lot of legacy code that I've built up for a while and I'm under time constraints that probably will make it hard to switch everything over. I currently use a variant of an OBB for every single object, but with no collision resolution. Under my time constraints, I was looking for something I could more or less use out of the box while I slowly replace things over time. Your library looks really great and accessible, but as I mentioned, I was having trouble figuring out exactly how / where to change little pieces to deal with the fact that my system has some different requirements.

My first idea, however, would be to pre-simplify the paths using a common algorithm (Ramer–Douglas–Peucker ?)

Currently, each object in my world has a list of points defining its boundaries, a center world translation (vec2), and a local "pose" matrix. (It's kind of unusual and in retrospect I might've done something differently.) The points are in local space relative to the center world translation, but the distances are represented in world space. The local matrix represents some local scaling and rotation independent of a parent (because I have a scene graph sort of arrangement). It also has some translation relative to the object, separate from the parent, too.

For example, two colliders might have boundaries with coordinates (-2,-2), (-2,2), (2,2), (2,-2) with different centers, and different local pose matrices.

Doing collision detection essentially require first offsetting each collider by its center and then transforming each point. That's what I currently do, and I somehow feel it's a little "dumb and slow" based on the small number of objects I seem to be able to support with this scheme and profiling.

In any case, are you suggesting that I continue with this approach and just pre-transform all of the points into world space per-frame and also compute the normals per-frame (using https://github.com/RandyGaul/cute_headers/blob/b58b930cec60051c910b4c886d4e9a0e892a9eb0/cute_c2.h#L1242 ) based on the transformed points? (If the object rotates or scales, my understanding is that I'd need new normals.) Then I'd call c2PolytoPolyManifold?

Would you clarify this?

Another approach would probably be to transform only one of the shapes into the space of the other, but then I'd need transforms working. Or maybe there is some way I can just replace some of the existing calls to c2x and the transpose variants with a `4x4Matrix vertex` call? I am not entirely sure how to do this without breakage. I think this is where I'd need to make some modifications: https://github.com/RandyGaul/cute_headers/blob/b58b930cec60051c910b4c886d4e9a0e892a9eb0/cute_c2.h#L2155 and https://github.com/RandyGaul/cute_headers/blob/b58b930cec60051c910b4c886d4e9a0e892a9eb0/cute_c2.h#L2187 but those are probably not the only spots.

In any case, pre-transforming just seems a lot easier right now, unless I am missing something.

I've also looked at alternatives like Box2D, but I'm unhappy with how complex those seem to be. I think being able to modify your library a bit would be far less overkill.

For now, if I can get solution 1 working / clarified, maybe that's a better idea overall. I trust you more to mess with the internals of your library. (Maybe there could be an alternative version of the functions that accept a more complex transform.)

Thanks again for your time.

RandyGaul commented 1 year ago

Gotcha, seems you're probably stuck with the two options we discussed:

  1. Change c2x to a 3x2 matrix (or 4x4 if you like). Change all the associated mul functions. Call c2Norms whenever a polygon is transformed, or invert your scale/rotation part of your transform and apply that to the norms.
  2. Pre-transform all your verts each frame and feed things into my library after.

Personally I think 2 is the way to go. I don't think you'll have a perf problem from 2. Just make sure to recalculate normals.

KTRosenberg commented 1 year ago

I see. So for 2, confirming the algorithm that I might use in a frame loop (let's say I use your library from the beginning):

 call c2MakePoly on some points to make a convex hull. (I assume they can be in any order?)
 optionally simply my polygon to fit with around 8-16 vertices (as you recommend in the comments).
 every frame {
     for each shape {
         pre-transform the vertices
         `c2Norm` on the polygon (some memcpying will happen here, but oh well)
     }
     broadphase: (e.g. AABB around the polygons) -> generate collision pairs
     narrowphase:  for each collision pair, call `c2PolytoPolyManifold`
}

Something like this seems reasonable for now.

I wonder, however, do you have any recommendations for dealing with tunneling? My understanding is that I would need to handle continuous collision myself if I wanted that, but that's a different problem.

Otherwise, thanks for helping me think about the next steps. Maybe sometime in the future the library could be extended to support more complex transforms if the use case is strong enough.

RandyGaul commented 1 year ago

You can call c2TOI to avoid tunneling. It's not easy though, there's lots of gotchas that can pop up. The easiest way to avoid tunneling is to use a small timestep or limit velocities, use thick walls, etc.

Don't worry about memcpy'ing in terms of perf. Here I'll write a few notes down about perf to help you along without getting stuck in analysis paralysis.

  1. Looping over arrays is very, very fast. Especially long arrays if you call something like memcpy. It can do quite a bit of advanced things internally, and CPU's really like traversing memory linearly.
  2. Don't worry about duplicating data. RAM is cheap and machines today have a lot of it.
  3. Calling sqrt or cos is not expensive by today's standards.
  4. What is slow, is accessing a pointer you haven't accessed recently. This is because of how CPU caching works. CPUs can avoid cache-costs by looping over arrays where they can predict and cache things before you need them. But, accessing random pointers you haven't accessed in a while will be 100 to 1000x slower than performing some extra arithmetic. But still, don't worry about it too much.
KTRosenberg commented 1 year ago

Fair enough. I've seen bits and pieces of solutions involving extending the bounding box between frames, but then rotation, scaling, and all sorts of other things might mess with the results. Using small timesteps is hopefully good enough. Maaaaaaybe not for things like bullets, but I guess there can be special cases.

Point 4 is exactly why I think my system is not able to cope with a lot. Over time I'll need to restructure things.

EDIT: but if I am going the un-optimized route, that's 16 matrix transformations per object per potential collision pair (if there are 16 vertices), just to transform the points. Is this reasonable given that normally we do this in the shader?

In any case, would you say that the pseudocode is about right for option 2?

Thanks again.

RandyGaul commented 1 year ago

Marking this closed for now, let me know if you'd like to reopen it!