Open jwnimmer-tri opened 4 days ago
Towards #21526.
As a sample of what's new, here's the disassembly of ComposeXXImpl compiled for "AVX3" using GCC on Jammy:
ComposeXXImpl
endbr64 mov eax,0x7 vmovupd ymm3,YMMWORD PTR [rdi] vmovupd ymm2,YMMWORD PTR [rdi+0x18] kmovb k1,eax vmovupd ymm0{k1}{z},YMMWORD PTR [rdi+0x48] vmovupd ymm1,YMMWORD PTR [rdi+0x30] vmulpd ymm5,ymm3,QWORD PTR [rsi]{1to4} vmulpd ymm4,ymm3,QWORD PTR [rsi+0x18]{1to4} vfmadd231pd ymm0,ymm3,QWORD PTR [rsi+0x48]{1to4} vmulpd ymm3,ymm3,QWORD PTR [rsi+0x30]{1to4} vfmadd231pd ymm5,ymm2,QWORD PTR [rsi+0x8]{1to4} vfmadd231pd ymm0,ymm2,QWORD PTR [rsi+0x50]{1to4} vfmadd231pd ymm4,ymm2,QWORD PTR [rsi+0x20]{1to4} vfmadd132pd ymm2,ymm3,QWORD PTR [rsi+0x38]{1to4} vfmadd231pd ymm5,ymm1,QWORD PTR [rsi+0x10]{1to4} vfmadd231pd ymm0,ymm1,QWORD PTR [rsi+0x58]{1to4} vfmadd231pd ymm4,ymm1,QWORD PTR [rsi+0x28]{1to4} vfmadd132pd ymm1,ymm2,QWORD PTR [rsi+0x40]{1to4} vmovupd YMMWORD PTR [rdx],ymm5 vmovupd YMMWORD PTR [rdx+0x18],ymm4 vmovupd YMMWORD PTR [rdx+0x30],ymm1 vmovupd YMMWORD PTR [rdx+0x48]{k1},ymm0 vzeroupper ret
To see everything, after building this you can peek at drake/bazel-bin/math/_objs/_geometric_transform_compiled_cc_impl/fast_pose_composition_functions_avx2_fma.pic.o.
drake/bazel-bin/math/_objs/_geometric_transform_compiled_cc_impl/fast_pose_composition_functions_avx2_fma.pic.o
This change is
Towards #21526.
As a sample of what's new, here's the disassembly of
ComposeXXImpl
compiled for "AVX3" using GCC on Jammy:To see everything, after building this you can peek at
drake/bazel-bin/math/_objs/_geometric_transform_compiled_cc_impl/fast_pose_composition_functions_avx2_fma.pic.o
.This change is