cppfastio / fast_io

Freestanding fast input/output for C++20
MIT License
670 stars 55 forks source link

To do: Rewrite code cvt #360

Open trcrsired opened 2 years ago

trcrsired commented 2 years ago

@BobSteagall 's UTF-utils https://github.com/BobSteagall/utf_utils is too platform-specific and does not work with AVX512. Going to Rewrite it

BobSteagall commented 2 years ago

FYI, the avx_test branch supports AVX2. Support for AVX512 is a straightforward extension of that.

A visual explanation can be seen at https://www.youtube.com/watch?v=qXleSwCCEvY&list=PLHTh1InhhwT4qBc2aCJUKYn-vhmZOGh01&index=10 starting at time 41:00. Benchmark results can be seen starting at 48:53. Going to a larger register size does not always help. It is more beneficial when you expect long runs of ASCII.

Good luck with your project!

trcrsired commented 2 years ago

FYI, the avx_test branch supports AVX2. Support for AVX512 is a straightforward extension of that.

A visual explanation can be seen at https://www.youtube.com/watch?v=qXleSwCCEvY&list=PLHTh1InhhwT4qBc2aCJUKYn-vhmZOGh01&index=10 starting at time 41:00. Benchmark results can be seen starting at 48:53. Going to a larger register size does not always help. It is more beneficial when you expect long runs of ASCII.

Good luck with your project!

Oh, BobSteagall. Thank you.

Before I wasn't a simd expert and has little knowledge. However, now it is extremely different since I think I have written a lot of vector extension code and can probably try something by myself on this since I have successfully written SIMD code. I am also very interested on working on wasm simd.

For example, something like this or sha256, sha512 things. https://github.com/cppfastio/fast_io/blob/db563bc7dc9958c3ff5d9f5c6c75fc219c132369/include/fast_io_core_impl/simd_find.h#L39

Not every platform would necessarily have builtins like __builtin_ia32_pmovmskb128 to get masks. For example, I do not see how to get that for arm neon.

I also find that getting masks for shifting may not be a good solution since sometimes std::countr_zero would screw up for random reasons. Just knowing zeros is not necessarily good enough for a lot of jobs like this.

I am thinking about trying them myself. https://github.com/cppfastio/fast_io/blob/db563bc7dc9958c3ff5d9f5c6c75fc219c132369/include/fast_io_core_impl/simd_find.h#L39

This shows getting masks may not be a very good idea since they are relatively slow compared to just testing whether SIMD vectors are zero or not.