Open mborland opened 6 months ago
There are operations like: https://github.com/cppalliance/decimal/blob/develop/include/boost/decimal/detail/wide-integer/uintwide_t.hpp#L888 which are ripe for packing into AVX2 / ARM NEON instructions instead of repeated calculations.
I think ADX instructions may be the right move for this on x64. They are designed for multi-precision like applications with chained add or mul.
There are operations like: https://github.com/cppalliance/decimal/blob/develop/include/boost/decimal/detail/wide-integer/uintwide_t.hpp#L888 which are ripe for packing into AVX2 / ARM NEON instructions instead of repeated calculations.