Closed bitshifter closed 2 years ago
Somehow I missed this issue when I was checking for existing issues before filing #62. I think #62 can be merged with this issue, as it falls under the umbrella of the title.
Quoted from issue #62:
I've been poking at Psychopath a bit again, and I realized that in the context of a ray tracer you pretty much never need full projective transforms, you only need affine transforms. And this presents a lot of opportunities for optimization:
- An affine transform matrix only needs to store 4x3 floats, not 4x4, because the fourth row is implicitly known.
- Transforming points/vectors can be further optimized due to the statically-known implicit fourth row.
- Matrix inversion can also be further optimized.
- There are tricks for directly performing inverse transforms on points/vectors that are more efficient than fully inverting the matrix + transforming.
To expand on that: affine transforms cover all of the transforms that are useful for object-space/world-space transforms:
The only thing full 4x4 matrices cover beyond that are projective transforms (e.g. perspective projection), which in the context of graphics are typically only used for camera projections to screen space. So I think affine transforms are useful to have as a separate type: client code can use them whenever object transforms are the concern, and screen space isn't relevant. It both makes the intent clearer and opens a lot of doors for further optimization as noted above.
Lastly, the current implementation of Mat4::transform_point3()
, as noted in its comment, is optimized in a way that won't work correctly for perspective projection matrices. That makes a lot of sense given that the common case is likely to be object transforms, but it's a bit unfortunate that it's not correct for the general case. Making that compromise is only necessary because Mat4
is trying to simultaneously serve two different purposes: both affine and projective transforms. If we add a separate affine transform type, then we won't need that compromise: the affine transform type can have the optimized version, and Mat4
can be correct for the general case.
As I mentioned in #62, this is something that I would love to implement for Glam. However, my time at the moment is limited, so I probably won't get around to it for a good while. If someone gets around to it before me, that's also great. I'll come back and comment here if/when I start working on it, so people don't have to worry about duplicating efforts.
I may have time to look at it after I've finished with some other issues.
From memory I was planning on basing an implementation on the Transform4D
from Foundations of Game Engine Development Volume 1.
If I do start work on it I'll mention it here.
One more thing possibly worth mentioning explicitly: one of the benefits of an affine matrix over e.g. a decomposed translation/rotation/scale representation is the above-mentioned ability to represent shear. And that comes up more often than one might think.
For example, even if an application doesn't directly support shearing, if it supports non-uniform scaling and parent-child relationships, then shear can still easily occur with a non-uniform scaled parent and a rotated child.
So I think a full affine transform matrix type is generally the way to go, because it handles all the corner cases correctly, which a decomposed solution doesn't. Decomposed representations make a lot of sense for user-interaction, simulation, etc. But IMO they don't make much sense for just representing raw transforms for e.g. rendering.
For sure, the Transform4D
I'm referring to is stored as 4x4 matrix rather than a decomposed solution - sample code is here http://foundationsofgameenginedev.com/FGED1-code.cpp.
The main thing is it's a separate type so can shortcut things like inverse, multiplies and transforming 3d points and vectors.
Once #36 is complete one question that might need to be answered for this type is should it be stored as the 12 byte Vec3
columns or as 16 byte Vec3A16
(or whatever I end up naming it) columns. There's pros and cons to both. Being able to trivially convert the affine type to a Matrix4
could be handy in the game case, but then it's a lot of wasted space.
~Maybe I'm brain-farting, but wouldn't you want to just store it as 3 Vec4
s? Then, for example, transforming a point is just three dot-products with that point: one with each of the three Vec4
s. Moreover, converting to a Mat4
is trivial: just append one more Vec4
with the values [0, 0, 0, 1]
.~
Edit: yeah, I brain farted. That's slower than splatting each of the point's dimensions and doing both the multiplies and sums via SIMD.
I think it comes down to whether the performance or the space savings are more important. IMO the performance is more important, so my vote is for just sticking with four Vec4
s, and wasting the last element of each.
And I think I can make a pretty good argument for the space savings not being so important: even though it's wasting 25% of the space of the transform matrices, matrices themselves typically only make up a tiny percentage of the total in-memory data of a game or offline renderer. Meshes, textures, etc. take up way more space. So for the typical use-cases of Glam, it's not really wasting 25% when you look at the big-picture, it's more likely wasting 1%, if even that.
I noticed on Embark discord that their macaw library has an affine transform type. It's not public yet but if they have already written one it might be good to use that rather than duplicating the effort.
I would be interested in a similar type for 2D transforms. Euclid already uses 3x2 matrices for this and performs more than 70% better than glam at point2 transform benchmarks.
At Embark we have an IsoTransform
which represent isometric transforms using a Vec3
translation and Quat
for rotation. This is great for expressing poses of objects, and is fast and compact. I can add it to glam
if there is interest.
Another type that may be of interest would be conformal transforms: translation, rotation and uniform scale. There are two conformal transform versions that may be of interest: with or without mirroring. The interesting things about these class of transforms is they maintain the shape of things (i.e. don't squash or stretch them), and are also pretty efficient to work with. I have some code for this too that I can clean up and add to glam, if it would be interest. Does need some good name suggestions though.
The final step is affine transforms. 3D affine transforms are best encoded as a 3x4 matrix, while 2D affine transform can use either a 2x3 matrix, or using only four numbers (using only one column of the rotation matrix, which effectively is a complex number rotator).
PS: saving space is often more about improving cache locality and thus reducing cache misses, which can have a huge performance impact.
I'll start working on a Mat3x4
that we can try out. Internally it will be row-major for performance (3x Vec4
), but the interface will be the same as for Mat4
.
So we have five levels of useful linear transforms:
Isometric
: translation + rotationConformal
: translation + rotation + uniform scaleConformalM
: translation + rotation + uniform scale + optional mirroringAffine
: translation + rotation + non-uniform scale + shearingLinear
(Mat4
): translation + rotation + non-uniform scale + shearing + perspectiveDo we want all levels in glam
? If yes I can make the PRs for the 3D versions.
Should we call these types Isometric3D
, Conformal2D
, Affine3D
etc?
I've mostly focused on affine transforms because they seem to be the most common, outside of a general purpose mat4, and generally something other libraries often provide. Do you have a use for these other types? I think I'd be reluctant to add anything else at this point. A 2D affine transform type is probably the next most useful one for glam users.
The isometric transform is extremely useful for encoding poses of objects, i.e. the position and orientation of things, especially in contexts where scaling is not wanted/needed (e.g. in physics). Also nice for view matrices (camera pose).
If you prefer I can have the transforms in a separate crate, or behind a feature flag, but I do believe a lot of games will want to use at least the isometric transform and the affine. The conformal ones are a bit more niche (but I’ve used them).
At Embark we have an IsoTransform which represent isometric transforms using a Vec3 translation and Quat for rotation.
My gut feeling here is that there's maybe a rabbit hole we don't want to go down here. And in this particular case it's also trivial for client code to just package a Vec3
and a Quat
into a struct itself if it needs that functionality. There would be no gain in performance, and very little gain in convenience, by adding a dedicated type to glam itself. And I suspect that's true for the other variants you're suggesting as well.
I think affine transformations sit at a bit of a different point in the problem space, where they are very general, common, and useful, but can't really be built from the pieces glam already provides. At least, not without sacrificing performance by e.g. just using a full 4x4 matrix.
@cessen in https://github.com/bitshifter/glam-rs/pull/157 I introduce an Affine3D
type that completely builds on glams other types, so it certainly is possible. This all basically comes down to “where does higher level types belong”? I agree this is a bit of scope creep for the project. Maybe a glam-utils
crate or something would make sense?
There is already an Vec3
+ Quat
IsoTransform
in glam, it's called TransformRT
and can be enabled with the transform-types
feature. There's also a Vec3
+ Quat
+ Vec3
affine type called TransformSRT
. It's on a feature because I haven't really committed to making these part of the "official" API. In part because the performance characteristics were generally worse than using a matrix. I have no idea if anyone is using them. Bevy for example opted to create its own equivalent of TransformSRT
rather than using the glam one. I guess they have more control over it that way and it's not a very complicated type.
It does raise the question @emilk , would the proposed transforms be Vec3
+ Quat
+ possibly some kind of scale, or were you intending wrap the affine type to implement these?
I think it makes sense for the 2d and 3d affine types to be a core part of glam. The majority of my use of 4x4 matrices at work and in home projects are for affine transformations. For me if the affine type is smaller and as fast or faster it makes sense to switch to it. There are quite a few Rust math libraries which have an affine transform type, but not the others you mentioned (and sometimes not regular matrices).
For the other types, my feeling is probably the best thing to do would be to put these kinds of things in a separate crate. A transform types crate could be a sub crate of the glam git repo so the code is together. The main thing about having them in a separate crate is they don't have the same API stability guarantees as glam. I think having the existing transform types on a feature has effectively made them invisible. At least in a separate crate they will appear on crates.io and docs.rs. If some type turned out to be quite popular it could migrate into glam proper.
The core module in glam could me made into a crate and used by a transform types crate. Again I wouldn't make the same API stability guarantees with the core crate. You can get a long way building with just the glam types, but for example for the affine types to support f64
the easiest way will be to make them use the core crate.
@emilk
in #157 I introduce an Affine3D type that completely builds on glams other types, so it certainly is possible.
That's a fair point. I don't think it's trivially implementable in the same way that just tossing a Vec3
and a Quat
into a struct is, but indeed what counts as trivial or not is a subjective and fuzzy boundary.
This is making me think, however, that we might be able to make it trivial, at least for one of the use-cases. The Affine3D
type you've defined aims at reducing the size of the data. That can certainly have positive impacts on performance due to better cache locality, but it depends on use-case. The other way to go, and what I suspect is what I want for my path tracer, is an implementation that still uses 4 simd vectors (16 floats) for storage, and optimizes for efficient SIMD utilization.
That latter use-case could be easily assembled if there were a Mat3A
(3 simd vectors / 12 floats) as a counterpart to Vec3A
. Then client code could trivially construct a SIMD-efficient affine transform type by just tossing a Mat3A
and a Vec3A
into a struct together. And Affine3D
proper can remain dedicated to the space-saving approach, which isn't quite as trivial to implement from existing types. And then both approaches/use-cases are covered.
I've modified the Affine3D
type @emilk contributed and have been messing around with it a bit, also added a 2D version. I renamed to Affine3D
to Affine3
and added a corresponding Affine2
implementation.
Swapping out Mat4
for Affine3
in my path tracer gave about a 5% performance improvement, so I feel like that's a good addition to the core library. As you can see from the commit, it didn't require a lot of code changes - https://github.com/bitshifter/pathtrace-rs/commit/1cf0dd838c8e3b2d9de60f60f33a929b9a1892dc. Primarily because I added the same Deref
that Mat4
has so you can access all the columns the same way. I've also made the internal matrix3
and translation
public members instead of requiring accessors. I feel like this is more like the rest of glam but it does have the downside of dealing with Vec3A
directly instead of it being encapsulated.
I'm thinking of renaming it Affine3A
because it contains internal padding and all the columns are Vec3A
, so it would match up with the convention used by Vec3A
and Mat3A
(which is new on master).
The Affine2
version is a bit of an anomaly because Mat2
is 16 byte aligned so Affine2
is also 16 byte aligned and thus it contains padding. But its columns are just Vec2
, so I don't really want to call it Affine2A
. Still pondering what to do with that one, it might be good to remove the 16 byte alignment from Mat2
and see what impact that has on performance.
To be honest I could just release things as is, I think they're in an OK state.
I am still bike shedding over affine and other transform types a bit. Mostly at this point over if they should be included in the main crate or a separate crate for transform types or stay on the transform-types
feature. For the moment I've put all the affine types on the transform-types
feature flag.
Performance wise, there is a big win using an Affine2
type over using Mat3
as a 2D affine transform. For 3D, the gain is less significant. The main benefit is inverse
is a decent gain, multiplying by Self
is also a bit faster, everything else is quite similar.
For 3D probably the most useful thing is the semantic benefit of having a dedicated affine type. I have made all the 3D transform types use Vec3A
as the performance is typically a lot better than scalar math.
Since I've added a Mat3A
all of these transform types are composed from other glam types and are mostly straight forward to implement. I think it is useful to provide these out of the box though, even if it's just something others can use as a basis for implementing something themselves.
I will hopefully make my mind up on how to include these in glam soon, last chance to give any feedback :)
2D Transforms
operation | mat3 | mat3a | affine2 |
---|---|---|---|
inverse | 11.4±0.09ns | 7.1±0.09ns | 5.4±0.06ns |
mul self | 10.5±0.04ns | 5.2±0.05ns | 4.0±0.05ns |
transform point2 | 2.7±0.02ns | 2.7±0.03ns | 2.8±0.04ns |
transform vector2 | 2.6±0.01ns | 2.6±0.03ns | 2.3±0.02ns |
3D Transforms
operation | mat4 | affine3a | isometry3a | transform3a |
---|---|---|---|---|
inverse | 15.9±0.11ns | 10.8±0.06ns | 5.4±0.05ns | 7.2±0.05ns |
mul self | 7.3±0.05ns | 7.0±0.06ns | 7.0±0.06ns | 8.0±0.21ns |
transform point3 | 3.6±0.02ns | 4.3±0.04ns | 7.8±0.21ns | 8.5±0.08ns |
transform point3a | 3.0±0.02ns | 3.0±0.04ns | 4.5±0.09ns | 5.4±0.12ns |
transform vector3 | 4.1±0.02ns | 3.9±0.04ns | 7.4±0.12ns | 8.2±0.28ns |
transform vector3a | 2.8±0.02ns | 2.8±0.02ns | 4.3±0.04ns | 5.2±0.05ns |
2D Transforms
operation | mat3 | mat3a | affine2 |
---|---|---|---|
inverse | 5.7±0.00ns | 4.4±0.01ns | 2.6±0.01ns |
mul self | 6.0±0.01ns | 2.9±0.06ns | 2.3±0.00ns |
transform point2 | 1.5±0.00ns | 2.2±0.00ns | 1.3±0.00ns |
transform vector2 | 1.2±0.00ns | 2.2±0.00ns | 1.0±0.00ns |
3D Transforms
operation | mat4 | affine3a | isometry3a | transform3a |
---|---|---|---|---|
inverse | 10.8±0.01ns | 7.1±0.01ns | 3.1±0.00ns | 3.8±0.05ns |
mul self | 4.7±0.01ns | 3.6±0.01ns | 4.5±0.01ns | 5.4±0.01ns |
transform point3 | 2.3±0.00ns | 2.3±0.00ns | 4.3±0.01ns | 4.3±0.00ns |
transform point3a | 2.2±0.00ns | 2.2±0.00ns | 2.9±0.00ns | 3.1±0.04ns |
transform vector3 | 2.2±0.00ns | 2.3±0.00ns | 4.0±0.00ns | 4.1±0.00ns |
transform vector3a | 2.2±0.05ns | 2.2±0.00ns | 2.8±0.03ns | 2.9±0.00ns |
Nice benchmarks!
Some feedback:
First the names: English is not my first language, but it seems Affine
and Isometry
have different conjugations. "Affine/Isometric" would make more sense (or "Affinity/Isometry", though I like that less).
"Transform" is also very non-specific (all of these types are transforms).
But the bigger problem with Transform3A
is that it is not closed under multiplication and inverse, making the name very misleading, and easily leading it to be misused. The reason for this is that multiplying two transforms with rotation and non-uniform scale produces a transform with shearing, but Transform3A
cannot express shearing.
So (A * B) * C != A * (B * C)
(for rotation and non-uniform scales) which is highly surprising. One effect of this is that transforming a vector by first A
and then B
is not the same as transforming it by the product A * B
. I think this is a big footgun that will cause many people to lose toes. I would suggest that Transform3A * Transform3A -> Affine3A
to avoid this.
Similarly, Transform3A::inverse()
cannot always return a Transform3A
that expresses the inverse, so T.inverse() * T != IDENTITY
(for rotation and non-uniform scales), which again is very surprising. Transform3A::inverse
could return a Affine3A
instead to solve this.
Since Transform3A
is neither closed under multiplication or inverse, I think it is not a good candidate to use to express transformations, which means the name is dangerously misleading.
(Note that these problems go away if we restrict the transform type to use uniform scaling, giving us a conformal transform type instead.)
Since I've added a
Mat3A
all of these transform types are composed from other glam types and are mostly straight forward to implement. I think it is useful to provide these out of the box though, even if it's just something others can use as a basis for implementing something themselves.
As long as glam has a Mat3A
type, then at least for my use-case I don't feel strongly about whether the affine transform types themselves are included or not. The Mat3A
satisfies what I need in terms of building blocks.
The main benefit is inverse is a decent gain, multiplying by Self is also a bit faster, everything else is quite similar.
For what it's worth, the faster inverse is actually the primary benefit I'm personally looking for. My path tracer has to do both forward and inverse transforms pretty frequently, but for reasons that are outside the scope of this thread it can't just pre-store the inverse matrix in most cases. It has to calculate it on the fly. Improving matrix inversion performance has a measurable impact on render times.
(In my path tracer I also plan to experiment with doing some funky things like only inverting the 3x3 matrix part and using subtraction for the translation part when I need to perform inverse transforms, to try to squeeze even more performance out of it.)
I renamed to
Affine3D
toAffine3
A bit of bike-shedding: I don't actually feel strongly about this, and will be fine either way, but I think I might prefer Affine3D
.
I always imagined that with e.g. Vec3
/Vec3A
, Mat3
/Mat3A
, etc. that the 3
referred the number of elements the types had (3 for vectors, 3x3 for matrices), rather than specifically the dimensionality of the space they're intended for. In practice, the distinction doesn't matter for those types since it's the same thing in those cases.
But with 3D affine transforms things get a little messier, since they're effectively 4D in the number of elements, but are 3D in the space they're intended for. Somehow Affine3D
makes more sense to me, because (to me eyes) it makes it clear that the 3
is referring to the dimensionality of the space the type is used with.
Regardless, I do like appending A
for the aligned version. Doing so leaves a name available for the space-efficient version as well, and is consistent with the other types. (And Affine3DA
maybe looks a little weird?)
Thanks for the feedback everyone!
Affine and Isometry have different conjugations. "Affine/Isometric" would make more sense (or "Affinity/Isometry", though I like that less).
TBH I completely lifted that naming from nalgebra
on the premise that it would be familiar to the Rust community and because they ended in 3
which fits my naming scheme, not because I think it makes sense :). Isometry is a bit too mathy for my liking, in that most people would have to look it up, it's also not entirely accurate (see https://mathworld.wolfram.com/Isometry.html), rigid body transform or rigid motion might be better, but I digress.
But the bigger problem with Transform3A is that it is not closed under multiplication and inverse, making the name very misleading, and easily leading it to be misused. The reason for this is that multiplying two transforms with rotation and non-uniform scale produces a transform with shearing, but Transform3A cannot express shearing.
A Transform
type, expressed as a separate rotation, not uniform scale and translation is an extremely common data structure in game engines, (e.g. https://forum.unity.com/threads/transform-matrix-elements.18836/#post-128496, https://github.com/defold/defold/blob/1ae302ec33d4514408c04ad3ae5d3c1efe2057bd/engine/dlib/src/dlib/transform.h, https://docs.unrealengine.com/en-US/API/Runtime/Core/Math/FTransform/index.html, https://docs.aws.amazon.com/lumberyard/latest/userguide/component-transform.html). It is specifically for positioning entities in a 3d scene and supporting a parent child relationship between entities, i.e. a child transform is relative to a parent entity and if there is no parent the transform is relative to the origin. The order of operations follows the parent child relationship, so A * (B * C)
does not necessarily make any sense with these types. I think one of the reasons they are quite commonly used in game engines is rotation is stored separately from scale.
edit: I had all these links handy because I'm subscribed to a bevy issue with an uncannily similar discussion.
I guess because this is more of an engine specific type, in that it is tightly coupled with how a game engine represents a game world, perhaps it doesn't make sense to be a type in a math library.
I mostly included these out of interest for a performance comparison, but I think on reflection I will remove both of these types from the glam crate. I am not sure if anyone is using them, I guess I will find out!
But with 3D affine transforms things get a little messier, since they're effectively 4D in the number of elements, but are 3D in the space they're intended for. Somehow Affine3D makes more sense to me, because (to me eyes) it makes it clear that the 3 is referring to the dimensionality of the space the type is used with.
Regardless, I do like appending A for the aligned version. Doing so leaves a name available for the space-efficient version as well, and is consistent with the other types. (And Affine3DA maybe looks a little weird?)
There are 4 rows but each row is a Vec3A
, internally it contains a Mat3A
. I'm in two minds about it really. It does leave room for an Affine3
if people want it in the future, however I don't know why people would want that when the performance would be so much worse that a Mat4
or Affine3A
.
Ultimately public members are Vec3A
/ Mat3A
and the naming reflects that.
The Affine2
also contains padding (Mat2
is a __m128
) so it's not like I'm being super consistent with the suffix there, it's more about reflecting the type used for the rows.
Ultimately I feel like the D
is a bit redundant, it's one more character to type and doesn't add a lot of additional information. The A
ties it to the other types in the crate. If anything I would potentially drop the A
on Affine3A
but I feel like keeping it for now as the safer option, dropping it later is easy, versus renaming it from Affine3
to Affine3A
to make way for a compact Affine3
type :)
Ultimately I feel like the
D
is a bit redundant, it's one more character to type and doesn't add a lot of additional information.
Yeah, that's totally fair. It was just a slight personal preference on my part, and entirely subjective. No worries at all.
Over-all I'm super psyched for these new affine types and Mat3A
. Thanks for putting the work into this! Can't wait to use it. :-)
(I don't have any real investment in the other new types, as I don't plan to use them, so I'll leave those to you and emilk.)
I just tried out the latest glam on my path tracer, replacing Mat4
with Affine3A
. I'm getting a consistent 12-15% render time improvement on a variety of test scenes, so it really does make a difference!
Thanks so much!
I'm not sure why I've kept this open at this stage, it seems solved with the adding of affine types and I think I talked myself out of adding isometries :)
When dealing with transforms that only contain position, orientation and possibly scale many operations can be performed more efficiently than using a general purpose matrix.
A while ago I experimented with transform types which contained position and orientation (as a quaternion) much like Unreal's
FTransform
, but I found these to not be particularly more efficient or to save much space in practice, so they are not enabled by default. They can be enabled using thetransform-types
feature.Another option might be to have these kind of transforms backed by a matrix with simplified methods/operators similar to
Transform4D
described in Foundations of Game Engine Development Volume 1.