Open pettyalex opened 1 year ago
For a tiny bit more context: NEON is basically ARM SSE, and SVE is ARM AVX2 or 512, it's actually width generic so the same instructions can apply to vectors of varying sizes.
The cloud vendors are starting to offer ARM compute capacity for significantly cheaper than x86, to where even un-optimized ARM ports can be cost effective. If STAR were updated to support NEON or SVE, Amazon Graviton 3 would become significantly more cost effective vs any other options.
Hi Alex,
At present, the only feature in STAR that uses SIMDE is the CellRanger-like trimming of the adapter for STARsolo runs
(option --clipAdapterType CellRanger4
).
Opal was easy to use and gave results most similar to CellRanger.
I am not sure if it's worth it to explore other options at this moment.
Just to clarify - that's very helpful. If I'm running a pipeline and not using that particular option, it would be fine to compile and run on an ARM processor (no performance penalty due to lack of SIMDE operations). Would I still run into problems getting STAR to compile in that environment (I was about to try, so I can work it out experimentally I guess, but any tips appreciated).
As long as you can compile it, there will be no performance penalty. There may be some compilations issues with g++ flags. I think you will need make CXXFLAGS_SIMD=""
to get rid of the SIMD-specific flag.
Just to confirm, that worked beautifully to build on an ARM system just now. Still need to test, but thanks for the tip
Hi, thanks for making such a useful tool! I noticed that STAR has ARM support by dropping SIMDE into Opal. That's a great place to start, but there's room for a lot of improvement.
Are there any plans for ARM support in the future? I noticed that the README still mentions explicitly that only x86 is supported, but it looks like the Debian Med team has been patching SIMDE in for ages to get STAR working on more platforms.
I'm seeing some exploration to bring NEON support to Opal: https://github.com/Martinsos/opal/pull/32 but Opal looks like a fully dead project, so maybe with STAR as its primary consumer the changes could just be made here?
AVX512 also exists too now, and it's very likely that updating Opal / STAR to support AVX512 could accomplish huge increases in performance without changing the existing behavior / algorithms at all. AVX512 performance isn't awesome on Skylake or Cascade Lake Xeons, but it looks outstanding on Ice Lake and newer. Thoughts?