bitshifter / mathbench-rs

Comparing performance of Rust math libraries for common 3D game and graphics tasks
Other
197 stars 16 forks source link

Include tests for `vek` #1

Closed zesterer closed 4 years ago

bitshifter commented 5 years ago

This library? https://github.com/yoanlcq/vek. Shame it doesn't have mint support, that would make it easier.

zesterer commented 5 years ago

Yep. There was talk of adding mint support. It's increasingly being used by game devs though, so I think comparison metrics would be very useful.

bitshifter commented 5 years ago

I added mat4 benchmarks for vek, they're on this branch https://github.com/bitshifter/mathbench-rs/tree/vek.

I haven't added any unit tests because they rely more on mint.

On my machine vek performed a bit slower than cgmath on most benchmarks and a few were a lot slower than the others. Not sure why. I am not using nightly, vek looked like it had some kind of simd support on nightly? I also don't use full LTO, so if vek is relying on that for inlining that could make it slower.

zesterer commented 5 years ago

Thanks. Perhaps @yoanlcq would be interested in this.

bitshifter commented 5 years ago

If you run cargo bench mat4 it will just run the mat4 benches which are the ones I've added vek for.

There's a couple of ways of investigating perf.

I've added public wrapper functions to src/lib.rs and used cargo asm to look at the assembly (cargo asm can only find functions in a lib and they can't be inlined). For example cargo asm mathbench::glam_mat4_mul shows glam's Mat4 mul. I have a gist demonstrating this here https://gist.github.com/bitshifter/7741d701f9ea1fbc29b9e39c01fb4f1c.

You could probably also use a profiler to inspect a specific benchmark.

yoanlcq commented 5 years ago

Hi,

I'll want to take a thorough look at this when I have some time; in any case, thanks a lot for adding vek, and mentioning me!

I wouldn't be surprised if vek performed less well than it should, which seems to be the case; Apart from some release-mode assembly-checking with #[repr_simd] at godbolt.org, I didn't actually spend any time on profiling or making sure the overall generated assembly is not trash... :see_no_evil:

A "fair" benchmark would use types from vek's repr_simd modules where possible (e.g vek::mat::repr_simd::Mat4<f32>). These are not the default imports, because #[repr_simd] types have some properties that might break some assumptions, such as alignment and size (e.g a #[repr_simd] Vec3 has the same size as a Vec4).

I've taken a look at cargo asm with my crate's repr_simd::Mat4<f32> multiplication, in release mode, and I'm somewhat suprised by the generated assembly; there's a bunch of movups and movss which shouldn't be there. That's something I should investigate...

I haven't taken a look at the other benchmarks yet.

That's also a good incentive for me to start making vek compatible with mint!

bitshifter commented 5 years ago

@yoanlcq I switched to nightly to try out repr_simd support but I was having trouble getting the repr_simd version to compile, e.g.

pub fn vek_mat4_mul_vec4(m: &vek::mat::repr_simd::column_major::Mat4<f32>, v: &vek::vec::Vec4<f32>) -> vek::vec::Vec4<f32> {
    *m * *v
}

The above compiled if I moved repr_simd, otherwise it seemed like no std::ops::Mul implementations were found.

yoanlcq commented 5 years ago

Yes, in this case you are supposed to use vek::vec::repr_simd::Vec4<f32> ; in fact, every module has repr_c and repr_simd submodules, the default being repr_c. All types in repr_c modules implement From their repr_simd counterpart, and vice-versa.

So, any of these two should work:

pub fn vek_mat4_mul_vec4(m: &vek::mat::repr_simd::column_major::Mat4<f32>, v: &vek::vec::repr_simd::Vec4<f32>) -> vek::vec::repr_simd::Vec4<f32> {
    *m * *v
}

pub fn vek_mat4_mul_vec4(m: &vek::mat::repr_simd::column_major::Mat4<f32>, v: &vek::vec::Vec4<f32>) -> vek::vec::Vec4<f32> {
    (*m * (*v).into()).into()
}

Thanks again!

bitshifter commented 5 years ago

Ah that worked, for some reason when I tried to import the repr_simd Vec4 initially I got an error about it being private, I must have done something wrong.

I've committed an update using the repr_simd types, performance hasn't really improved but from cargo asm mathbench::vek_mat4_mul_mat4 I can see a bunch of function calls being made, linking with LTO or adding #[inline] should sort that out.

I've been avoiding LTO in mathbench since I'm interested in glam's (my lib) performance without LTO. Possibly I should consider benchmarking with and without it.

icefoxen commented 5 years ago

Vek 0.9.10 now supports mint conversions for basic types, if that helps. :tada:

bitshifter commented 5 years ago

The main blocker for vek benches is AFAIK it requires nightly, so I want to come up with a way to make it (and others) optional.

bitshifter commented 4 years ago

I've added vek to benchmarks and included results in the README. Some results are a lot slower than other libraries, I haven't investigated why. Usually it's to do with function calls not getting inlined. Note that I ended up using vek's repr_c types over repr_simd for a few reasons:

I would consider adding the repr_simd types but I have some other things I want to work on, so it might not happen for a while, I would take a PR.