Alternate Vec3 that's actually 12 bytes

hrydgard commented 4 years ago

I've found that when dealing with mesh data files and meshes in memory and similar, you sometimes really want a Vec3 type that's guaranteed to actually be 12 bytes (3*mem::size_of()) in size.

I could of course implement my own Vec3 and have it by the side, but I think this need might be common enough that it might be worth including (since glam's SIMD Vec3 is stuck at 16 bytes, for pretty good reasons).

One idea might want to be to export a RawVec3 type or something that's essentially equivalent to the current non-SIMD fallback, even if SIMD is enabled. What do you think?

bitshifter commented 4 years ago

I'm not sure what the best thing to do here - I'm still not totally convinced the performance advantage of the SIMD Vec3 is worth the downsides. I have a lot of microbenchmarks showing a performance benefit from SIMD but I don't know if that necessarily translates into a perf benefit in real world code. I mean it did for my path tracer but regular game code is a bit different.

In any case. I did try and make it easier to deal with unaligned data by providing conversions to and from (f32, f32, f32) and [f32;3]. Are you using those at all? It is slower to load and store from unaligned data and working with Vec3 if there's not much actual math work going on.

It would be interesting to see some sample code of hoops you are having to jump through, or are you just avoiding the glam::Vec3 entirely? If you can provide an example code that might be helpful for trying to come up with a solution.

hrydgard commented 4 years ago

The case that made me write this was implementing GLTF loading, and wanting to use slices/vectors of Vec3 as temporaries before writing to GPU memory. It would obviously work just as well with [f32; 3] but it makes the code nicer to use Vec3 (since I compute bounding volumes and stuff during load, and for that computation I like stuff like Vec3::length and Vec3::min/max instead of open-coding the float operations). Though the easy conversion from/to [f32;3] might look okay I guess.

I'm also unconvinced by the benefits of keeping Vec3 in a SIMD register, I think three floats is just fine for most cases. If you have something that takes a ton of CPU time, you can manually use Vec4 instead without too much trouble if that is beneficial. And if it's something really intense you'll want to use intrinsics anyway probably, or move it to the GPU.

bitshifter commented 4 years ago

If you try the [f32;3] conversion let me know how you get on and if there's any improvements that might make it easier.

I have a feature flag scalar-math which turns off SIMD and 16 byte alignment entirely. I'm thinking I might experiment with a feature that disables SIMD for types like Vec3 and Mat3 where it results in padding. Depending on how that turns out maybe it can become the Vec3 implementation for glam.

Might take a bit of time before I get it done though.

hrydgard commented 4 years ago

I'll try it, sure.

Yeah there's no rush here, I'm more thinking about the long-term API design. And for that, having vec3 and mat3 non-padded is probably the right thing, or at least the right default.

My real goal is really that I simply want all new game library authors in the Rust world to prefer glam over nalgebra's bloat...

bitshifter commented 4 years ago

Related to this, in 0.8.6 I added the packed-vec3 feature flag to disable using SIMD storage for the Vec3 and Mat3 types. This avoids wasting some space due to 16 byte alignment at the cost of some performance. Not quite what you were after here but it does give a bit more fine grained control than disabling SIMD entirely.

repi commented 4 years ago

In any case. I did try and make it easier to deal with unaligned data by providing conversions to and from (f32, f32, f32) and [f32;3]. Are you using those at all? It is slower to load and store from unaligned data and working with Vec3 if there's not much actual math work going on.

Yes we use these, and they do help, but one of the bigger annoyances with the lack of a 12-byte Vec3 is that we can't use it in structures that need to have the binary layout without padding, such as FFI structures, which we have A LOT of.

So I think I would also be be in favor of having an explicit say PackedVec3 or even changing the current Vec3 to not be a SIMD type as I also don't think the value is that high for it, although that would definitely be a bigger change, and we would be happy to use a separate explicit type

bitshifter commented 4 years ago

This is definitely still in my radar, I've been mulling it over in the background trying to figure out how to do it without too much code duplication. Worst case I can just copy and paste vec3 and come up with something better later :)

I'm definitely leaning towards Vec3 being 12 bytes and adding a new type for SIMD, probably Vec3A16 or something. Moving house this week though so not much will happen for a little while.

repi commented 4 years ago

Think that would be the cleanest as well.

And no hurry :)

cart commented 4 years ago

I hit the same issues mentioned above and I would also prefer 12-byte to be the default :smile:

bitshifter commented 4 years ago

I started on this in a branch https://github.com/bitshifter/glam-rs/tree/vec3a16. I'm not quite done but let me know if you have any feedback.

cart commented 4 years ago

Just circling back here: I actually now want the current SIMD layout for my project. I just realized GLSL uniform buffer objects force vec3s to be 16 bytes, so the current layout is convenient for memcpy.

That being said, I still stand by the idea that 12 bytes should be the default. I'm just hoping 16 bytes is still an option

bitshifter commented 4 years ago

@cart I'm definitely keeping the SIMD type.

I have something working on the branch I linked above and I'm currently switching a couple of my projects over to use it just to see if the API feels OK.

The main thing I am not sure about as usual is naming :)

At the moment the regular Vec3 is 12 bytes. The SIMD type is called Vec3Align16. I'm not totally sold on the name, Vec3A16 is shorter but felt too cryptic. Maybe Vec3PS based on Intel's SSE naming convention, but that is also pretty cryptic. Vec3Packed less cryptic but wordy. This stuff keeps me awake at night 🙂

aclysma commented 4 years ago

Personally, my ideal math library would be two crates:

A raw and ugly (kind of sys-like) line-by-line port of DirectXMath
And then something on top of that that is more ergonomic

In DirectXMath, they use the type XMVECTOR for all vectorized values and XMFLOAT3/XMFLOAT4/XMFLOAT4x4 etc. for non-vectorized. I think providing both vectorized and non-vectorized structs is the right idea and I like the general naming approach they take.

In particular, I like the idea of using vector/float in the name to be explicit. Such as Vec3 vs. Float3. Maybe a Float3A for Float3 aligned on a 16-byte boundary. (Which I believe is another DirectXMath naming convention.). Another option would be to post-fix V or F, i.e. Vec3V vs. Vec3F

bitshifter commented 4 years ago

@aclysma I do like the idea of doing something similar eventually. Not so much porting DirectXMath, it's a large library from my point of view, but having a low level and high level APIs for glam. A bit like DirectXMath and SimpleMath. It's a bit down my todo list though. I have a couple of concerns that may or may not be unfounded:

I'm worried about the impact of a separate library and or additional function calls on debug performance. I mean it might be ok but I'd have to check.
Some intrinsics only take literal parameters (e.g. shuffle masks) which makes them impossible to wrap in a function (which takes the mask an input parameter) in stable Rust. I'm not sure if I actually need to that with any of the current implementations

One way of avoiding both of the above issues is to use macros for everything. This would reduce readability of the code a bit although the macros themselves would be quite straight forward. I'm already using macros in some places anyway. I would rather use function calls over macros for readability but I suspect I'll run into some issues.

None of these things are really major issues, I just want to focus on the high level API first.

Edit: I like the A suffix, I think I'll go with that.

kabergstrom commented 4 years ago

I think changing the default for Vec3 is going to cause quite a bit of trouble for various crates now as some expect Vec3 to be 16 bytes. It's a bit of an unnecessary breaking change IMO. I think different suffixes is a fine convention, but I'd appreciate keeping Vec3 the SIMD default by aliasing to Vec3A, and adding Vec3F or something as the scalar version.

aclysma commented 4 years ago

I agree with @kabergstrom - I don't have positive feelings about glam::Vec3 doing different things in different crates depending on what version they pull in. I like the idea of separate explicit types (vectorized = Vec3V or Vec3A vs. float = Vec3F) and then aliasing Vec3 so that updating is easy.

bitshifter commented 4 years ago

Well this is awkward as I've now published the change in 0.9.0.

@kabergstrom are there any specific crates you think this will cause problems for?

@aclysma do you think this would be a problem in practice? I would think crates trying to mix different Vec3 implementations wouldn't compile. Are there any specific crates where you think this will cause issues?

While I realise this is an annoying breaking change for existing users I think a 12 byte Vec3 is a better default for future glam users. The decision to make Vec3 16 bytes has always been the most controversial decision I've made in glam and the thing that takes people by surprise most often. I'm aware of some users that have disabled SIMD entirely to get a 12 byte Vec3, which is a shame.

I have been pleasantly surprised by the uptake of glam but it is still early in glam's life. I think the time to make a breaking change like this is now. Unfortunately with this particular change there is no easy way to communicate it with people via Rust's deprecation mechanism.

There are a few reasons I don't want to get rid of Vec3 and replace it with two different suffixed types. Primarily I would like a Vec3 that new users and people who don't read the docs can use with no surprises. Vec3 matches the naming used for all other types in the library. The Vec3A signifies that there is something different about it, not just different from Vec3, but from the other types as well. That difference is really the size, other types are align 16 but Vec3 and Mat3 were the only padded types. If I were to add a F suffix and a V suffix there's no parallel with the other types and I think it would be more confusing for new users. It's not a convention that would make any sense to apply to other types either.

Suffix wise it is possible that I will add support for i32 and f64 one day to I also have to keep potential naming for those future types in mind, F being an obvious suffix I might need to introduce for f32.

That's my rational for making this change. Perhaps if more users come out of the woodwork I say this is a bad idea I guess I could be convinced otherwise. Specifically @hrydgard or @repi since they opened this issue in the first place and put it on the Embark issues list.

Given all that if there is anything I can do to assist with migration I can try and oblige. For example adding a feature flag that exports Vec3 with a different name or suffix so it's easier to find and update existing Vec3 usage and migrate them to Vec3A where it makes sense.

[edit] I also wanted to add, I can't think of anything else in glam that will necessitate a breaking change on this scale in the future. Even the i32/f64 example above I think could be rolled out leaving existing type names intact. I do try and avoid dropping this kind of thing on people if I can. Generally I'm quite conservative about adding things to glam if I'm not sure about them because I don't want to make breaking changes in the future. The transform-types feature is an example of this.

repi commented 4 years ago

I do like the new naming, it is IMO easier to understand for new users and a better default, and do think that if one does such a change, now is an excellent time to do it as it is only going to be harder and more people affected later on.

Do hope we all can continue to collaborate and figure out best naming and features in this crate and eventually do a 1.0 release (later in the year?).

I did a quick check across our internal and public crates and it looked like this change won't break anything for us and just work, but @hrydgard will do a proper upgrade tomorrow and we'll see where we are at. But we are perfectly fine upgrading and fixing up usages after this. Due to the ambiguity of the storage size of the previous Vec3 we have avoided to use it in FFI and areas where we need to know the exact size of it.

aclysma commented 4 years ago

I think at this point, the most important thing to do is communicate it well so that people know about this change. I agree with you in practice, existing code will work because symver will not drop this change on people without their consent, and mixing the code should fail to compile. But it's something we'll have to all be mindful of when pulling in new crates since this is a major change that won't be very obvious. When I mentioned multiple impls in the same codebase, I meant upstream crates using it internally. Again, it's not a technical/doesn't-work problem but it's just another thing to think about when reading upstream code.

(This is off-topic and was something I was thinking about well before this change, mostly spurred by this thread: https://github.com/rust-gamedev/wg/issues/25) One of the reasons I think a two-layered math library could be useful is that the controversy on math libraries is almost completely down to API design (naming and if SIMD should be the "default" or not are great examples :D ) As you say, DirectXMath is huge and as a community we are probably all best off avoiding doing the same work. Given that I don't think everyone will ever agree on every single API decision in a math library, sharing a "sys" level crate would be the next best thing. Especially if we want to move beyond SSE2 for ARM or AVX support. (In fact, I would try to keep such a crate structured as closely as possible to DirectXMath so that we can port across any improvements they make, such as support for new instructions.)

bitshifter / glam-rs

Alternate Vec3 that's actually 12 bytes #36