As discussed in easel, here is my code for the ARM port of HMMER. I basically used the SSE implementation and swapped the __mm_* calls with their NEON counterparts, and it mostly works, except in ssvfilter.c where I have inconsistent results when it's enabled. Because the code uses esl_neon_hmax_f32 that just got merged you'll need the develop branch of easel checked out locally.
Otherwise, the code compiles and passes unit tests on my Raspberry Pi 4. I had to force enabling the -mlittle-endian flag in configure.ac, otherwise i had some consistency issues, probably with the way GCC handles data loading.
Actually, the base branch for this PR is hmmer/develop, which is the one I started working on, but since you instructed me to use h3-arm as the base i did so.
Hi @npcarter
As discussed in
easel
, here is my code for the ARM port of HMMER. I basically used the SSE implementation and swapped the__mm_*
calls with their NEON counterparts, and it mostly works, except inssvfilter.c
where I have inconsistent results when it's enabled. Because the code usesesl_neon_hmax_f32
that just got merged you'll need thedevelop
branch of easel checked out locally.Otherwise, the code compiles and passes unit tests on my Raspberry Pi 4. I had to force enabling the
-mlittle-endian
flag inconfigure.ac
, otherwise i had some consistency issues, probably with the way GCC handles data loading.