asb2m10 / dexed

DX7 FM multi plaform/multi format plugin
GNU General Public License v3.0
2.76k stars 240 forks source link

SIMD-vectorize operators using libhwy #392

Closed risicle closed 3 months ago

risicle commented 12 months ago

This is an offshoot from some experimentation I was doing using dexed and I wasn't really planning on developing further, but in case it's useful to anyone I'll present it here.

This uses google's highway library to add SIMD versions of the most expensive parts of the synthesis. My crude testing suggests modest speed improvements of 10-20% for SSE2 to AVX2, but on an AVX512 machine this easily doubles speed for me. An ARM NEON system showed an embarrassing 4% acceleration.

Dexed doesn't have a test suite, but comparing the results against the existing scalar implementation showed a maximum relative error of ~0.003 between the two, which will be attributable to a different order of operations in some places.

I don't know whether you'd ever actually want to make dexed depend on libhwy, but this would probably take a bit more polish if you ever wanted to actually merge it - I've tested it only on a limited variety of machines/architectures, haven't included options to disable vectorization support, have only configured libhwy for single-dispatch (no single-binary, dynamic cpu-extension-detecting, but I don't imagine it would be too hard to set that up).

The feedback-based operator loops are way too hard to vectorize, so they are left alone.

asb2m10 commented 3 months ago

This is a very cool implementation based on SIMDs. That said, to be able to build Dexed on a maximum of platforms, this PR cannot be merged because of highway dependency.

PS: I added your fork on Dexed README.md ; thanks for sharing !