Open trcrsired opened 2 years ago
FYI, the avx_test branch supports AVX2. Support for AVX512 is a straightforward extension of that.
A visual explanation can be seen at https://www.youtube.com/watch?v=qXleSwCCEvY&list=PLHTh1InhhwT4qBc2aCJUKYn-vhmZOGh01&index=10 starting at time 41:00. Benchmark results can be seen starting at 48:53. Going to a larger register size does not always help. It is more beneficial when you expect long runs of ASCII.
Good luck with your project!
FYI, the avx_test branch supports AVX2. Support for AVX512 is a straightforward extension of that.
A visual explanation can be seen at https://www.youtube.com/watch?v=qXleSwCCEvY&list=PLHTh1InhhwT4qBc2aCJUKYn-vhmZOGh01&index=10 starting at time 41:00. Benchmark results can be seen starting at 48:53. Going to a larger register size does not always help. It is more beneficial when you expect long runs of ASCII.
Good luck with your project!
Oh, BobSteagall. Thank you.
Before I wasn't a simd expert and has little knowledge. However, now it is extremely different since I think I have written a lot of vector extension code and can probably try something by myself on this since I have successfully written SIMD code. I am also very interested on working on wasm simd.
For example, something like this or sha256, sha512 things. https://github.com/cppfastio/fast_io/blob/db563bc7dc9958c3ff5d9f5c6c75fc219c132369/include/fast_io_core_impl/simd_find.h#L39
Not every platform would necessarily have builtins like __builtin_ia32_pmovmskb128 to get masks. For example, I do not see how to get that for arm neon.
I also find that getting masks for shifting may not be a good solution since sometimes std::countr_zero would screw up for random reasons. Just knowing zeros is not necessarily good enough for a lot of jobs like this.
I am thinking about trying them myself. https://github.com/cppfastio/fast_io/blob/db563bc7dc9958c3ff5d9f5c6c75fc219c132369/include/fast_io_core_impl/simd_find.h#L39
This shows getting masks may not be a very good idea since they are relatively slow compared to just testing whether SIMD vectors are zero or not.
@BobSteagall 's UTF-utils https://github.com/BobSteagall/utf_utils is too platform-specific and does not work with AVX512. Going to Rewrite it