Closed zesterer closed 4 years ago
Yep. There was talk of adding mint
support. It's increasingly being used by game devs though, so I think comparison metrics would be very useful.
I added mat4 benchmarks for vek
, they're on this branch https://github.com/bitshifter/mathbench-rs/tree/vek.
I haven't added any unit tests because they rely more on mint
.
On my machine vek
performed a bit slower than cgmath
on most benchmarks and a few were a lot slower than the others. Not sure why. I am not using nightly, vek
looked like it had some kind of simd support on nightly? I also don't use full LTO, so if vek
is relying on that for inlining that could make it slower.
Thanks. Perhaps @yoanlcq would be interested in this.
If you run cargo bench mat4
it will just run the mat4 benches which are the ones I've added vek for.
There's a couple of ways of investigating perf.
I've added public wrapper functions to src/lib.rs and used cargo asm
to look at the assembly (cargo asm
can only find functions in a lib and they can't be inlined). For example cargo asm mathbench::glam_mat4_mul
shows glam
's Mat4 mul. I have a gist demonstrating this here https://gist.github.com/bitshifter/7741d701f9ea1fbc29b9e39c01fb4f1c.
You could probably also use a profiler to inspect a specific benchmark.
Hi,
I'll want to take a thorough look at this when I have some time; in any case, thanks a lot for adding vek
, and mentioning me!
I wouldn't be surprised if vek
performed less well than it should, which seems to be the case; Apart from some release-mode assembly-checking with #[repr_simd]
at godbolt.org, I didn't actually spend any time on profiling or making sure the overall generated assembly is not trash... :see_no_evil:
A "fair" benchmark would use types from vek
's repr_simd
modules where possible (e.g vek::mat::repr_simd::Mat4<f32>
). These are not the default imports, because #[repr_simd]
types have some properties that might break some assumptions, such as alignment and size (e.g a #[repr_simd]
Vec3
has the same size as a Vec4
).
I've taken a look at cargo asm
with my crate's repr_simd::Mat4<f32>
multiplication, in release mode, and I'm somewhat suprised by the generated assembly; there's a bunch of movups
and movss
which shouldn't be there. That's something I should investigate...
I haven't taken a look at the other benchmarks yet.
That's also a good incentive for me to start making vek
compatible with mint
!
@yoanlcq I switched to nightly to try out repr_simd
support but I was having trouble getting the repr_simd
version to compile, e.g.
pub fn vek_mat4_mul_vec4(m: &vek::mat::repr_simd::column_major::Mat4<f32>, v: &vek::vec::Vec4<f32>) -> vek::vec::Vec4<f32> {
*m * *v
}
The above compiled if I moved repr_simd
, otherwise it seemed like no std::ops::Mul
implementations were found.
Yes, in this case you are supposed to use vek::vec::repr_simd::Vec4<f32>
; in fact, every module has repr_c
and repr_simd
submodules, the default being repr_c
.
All types in repr_c
modules implement From
their repr_simd
counterpart, and vice-versa.
So, any of these two should work:
pub fn vek_mat4_mul_vec4(m: &vek::mat::repr_simd::column_major::Mat4<f32>, v: &vek::vec::repr_simd::Vec4<f32>) -> vek::vec::repr_simd::Vec4<f32> {
*m * *v
}
pub fn vek_mat4_mul_vec4(m: &vek::mat::repr_simd::column_major::Mat4<f32>, v: &vek::vec::Vec4<f32>) -> vek::vec::Vec4<f32> {
(*m * (*v).into()).into()
}
Thanks again!
Ah that worked, for some reason when I tried to import the repr_simd
Vec4
initially I got an error about it being private, I must have done something wrong.
I've committed an update using the repr_simd
types, performance hasn't really improved but from cargo asm mathbench::vek_mat4_mul_mat4
I can see a bunch of function calls being made, linking with LTO or adding #[inline]
should sort that out.
I've been avoiding LTO in mathbench
since I'm interested in glam
's (my lib) performance without LTO. Possibly I should consider benchmarking with and without it.
Vek 0.9.10 now supports mint
conversions for basic types, if that helps. :tada:
The main blocker for vek benches is AFAIK it requires nightly, so I want to come up with a way to make it (and others) optional.
I've added vek to benchmarks and included results in the README. Some results are a lot slower than other libraries, I haven't investigated why. Usually it's to do with function calls not getting inlined. Note that I ended up using vek's repr_c
types over repr_simd
for a few reasons:
repr_simd
requires nightlyrepr_simd
vek
and it appeared to be using the repr_c
types.I would consider adding the repr_simd
types but I have some other things I want to work on, so it might not happen for a while, I would take a PR.
This library? https://github.com/yoanlcq/vek. Shame it doesn't have mint support, that would make it easier.