Open codekatana opened 7 years ago
Why not just compile it with clang and tell it to vectorize the loops?
Indeed that's a nice way to do it however, wouldn't it be nicer if just like BLAS (openBLAS) we had some hand-coded assembly?
It's a non trivial amount of work, with no guarantee of success. I'm certainly opened to a patch if someone wants to try it.
I agree, Yan. I was going through huff_ and fse files so as to understand the code and find out possible areas. I was also going through your blog so as to understand zstd and find a suitable area which can be accelerated using SIMD on arm. I would very much appreciate any pointers regarding that.
@codekatana No, If it was my repo, I'd want to keep the code base as clean as possible.
@bumblebritches57 - Yes, I can understand. Assembly can tend to be hard to read/maintain but in some situations, they provide good results. That's why BLAS libraries do their calculations in assembly and not in high level language.
Hello, I would like to know if it is possible to have ARM's SIMD (neon) routines to be added in huff0 and/or FSE encode/decode parts? That way, I can make them run a bit faster on raspberry pi.