Cyan4973 / FiniteStateEntropy

New generation entropy codecs : Finite State Entropy and Huff0
BSD 2-Clause "Simplified" License
1.34k stars 144 forks source link

Architecture specific optimizations #83

Open codekatana opened 7 years ago

codekatana commented 7 years ago

Hello, I would like to know if it is possible to have ARM's SIMD (neon) routines to be added in huff0 and/or FSE encode/decode parts? That way, I can make them run a bit faster on raspberry pi.

MarcusJohnson91 commented 7 years ago

Why not just compile it with clang and tell it to vectorize the loops?

codekatana commented 7 years ago

Indeed that's a nice way to do it however, wouldn't it be nicer if just like BLAS (openBLAS) we had some hand-coded assembly?

Cyan4973 commented 7 years ago

It's a non trivial amount of work, with no guarantee of success. I'm certainly opened to a patch if someone wants to try it.

codekatana commented 7 years ago

I agree, Yan. I was going through huff_ and fse files so as to understand the code and find out possible areas. I was also going through your blog so as to understand zstd and find a suitable area which can be accelerated using SIMD on arm. I would very much appreciate any pointers regarding that.

MarcusJohnson91 commented 7 years ago

@codekatana No, If it was my repo, I'd want to keep the code base as clean as possible.

codekatana commented 7 years ago

@bumblebritches57 - Yes, I can understand. Assembly can tend to be hard to read/maintain but in some situations, they provide good results. That's why BLAS libraries do their calculations in assembly and not in high level language.