RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.16k stars 1.24k forks source link

[math] Use dynamic dispatch for highway SIMD #21626

Open jwnimmer-tri opened 4 days ago

jwnimmer-tri commented 4 days ago

Towards #21526.


As a sample of what's new, here's the disassembly of ComposeXXImpl compiled for "AVX3" using GCC on Jammy:

endbr64 
mov    eax,0x7
vmovupd ymm3,YMMWORD PTR [rdi]
vmovupd ymm2,YMMWORD PTR [rdi+0x18]
kmovb  k1,eax
vmovupd ymm0{k1}{z},YMMWORD PTR [rdi+0x48]
vmovupd ymm1,YMMWORD PTR [rdi+0x30]
vmulpd ymm5,ymm3,QWORD PTR [rsi]{1to4}
vmulpd ymm4,ymm3,QWORD PTR [rsi+0x18]{1to4}
vfmadd231pd ymm0,ymm3,QWORD PTR [rsi+0x48]{1to4}
vmulpd ymm3,ymm3,QWORD PTR [rsi+0x30]{1to4}
vfmadd231pd ymm5,ymm2,QWORD PTR [rsi+0x8]{1to4}
vfmadd231pd ymm0,ymm2,QWORD PTR [rsi+0x50]{1to4}
vfmadd231pd ymm4,ymm2,QWORD PTR [rsi+0x20]{1to4}
vfmadd132pd ymm2,ymm3,QWORD PTR [rsi+0x38]{1to4}
vfmadd231pd ymm5,ymm1,QWORD PTR [rsi+0x10]{1to4}
vfmadd231pd ymm0,ymm1,QWORD PTR [rsi+0x58]{1to4}
vfmadd231pd ymm4,ymm1,QWORD PTR [rsi+0x28]{1to4}
vfmadd132pd ymm1,ymm2,QWORD PTR [rsi+0x40]{1to4}
vmovupd YMMWORD PTR [rdx],ymm5
vmovupd YMMWORD PTR [rdx+0x18],ymm4
vmovupd YMMWORD PTR [rdx+0x30],ymm1
vmovupd YMMWORD PTR [rdx+0x48]{k1},ymm0
vzeroupper 
ret    

To see everything, after building this you can peek at drake/bazel-bin/math/_objs/_geometric_transform_compiled_cc_impl/fast_pose_composition_functions_avx2_fma.pic.o.


This change is Reviewable