DLTcollab / sse2neon

A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
MIT License
1.3k stars 208 forks source link

Implement missing intrinsics required by IQ-TREE #188

Closed jserv closed 2 years ago

jserv commented 4 years ago

While transiting from SSE intrinsics used by IQ-TREE to Arm NEON, @joshlvmh extended SSE2NEON with additional intrinsics. See https://github.com/joshlvmh/iqtree_arm_neon/blob/7bc67d3449428b0b683fb7359c8945da218f5775/IQ-TREE/sse2neon.h#L4136

It would be great if we can rework these changes back to SSE2NEON.

joshlvmh commented 4 years ago

Thanks for this, I am happy to help with the reworking into SSE2NEON if needed. I did also extend the SSE2NEON test bench, located here: https://github.com/joshlvmh/iqtree_arm_neon/tree/7bc67d3449428b0b683fb7359c8945da218f5775/iq_tree_tests/sse2neon Please use any of this if you need.

jserv commented 3 years ago

Unimplemented intrinsics:

jserv commented 3 years ago

@joshlvmh, Recently, the necessary SSE intrinsics required by iqtree_arm_neon are implemented in latest SSE2NEON. Can you take a try?

jserv commented 2 years ago

All intrinsics required by iqtree_arm_neon were implemented.