alexdobin / STAR

RNA-seq aligner
MIT License
1.85k stars 506 forks source link

"Illegal instruction" error when using --clipAdapterType CellRanger4 #1218

Open VincentGardeux opened 3 years ago

VincentGardeux commented 3 years ago

Hello, I'm using STARsolo for aligning/demultiplexing BRB-seq libraries. These are bulk RNA-seq libraries with a very similar construct to 10x.

I recently saw the new option --clipAdapterType CellRanger4 which sounds super cool because that's exactly the trimming we were doing. However, when using this option, after loading the genome, it exits with an "Illegal instruction" error:

Apr 23 17:24:05 ..... started STAR run
Apr 23 17:24:05 ..... loading genome
Apr 23 17:26:19 ..... started mapping
Illegal instruction

If I use all other options but this one, the run is successful. I tried removing some options, in case there would be some incompatibilities, but it kept failing. Here is the simplest command I've run

STAR \
--runMode alignReads \
--runThreadN 2 \
--soloStrand Forward \
--genomeDir ${starindex} \
--soloType CB_UMI_Simple \
--soloCBstart 1 \
--soloCBlen 12 \
--soloUMIstart 13 \
--soloUMIlen 9 \
--soloCellFilter None \
--soloCBwhitelist ${barcodefile} \
---clipAdapterType CellRanger4 \
--readFilesCommand zcat \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix bam/ \
--readFilesIn fastq/toto_R2.fastq.gz fastq/toto_R1.fastq.gz

Thanks in advance!

alexdobin commented 3 years ago

Hi Vincent,

this is the problem with --clipAdapterType CellRanger4 which uses the SIMD instructions. What's the hardware you are using? You would need to compile STAR on your machine to use this option, by running make from the source/ directory. If this does not work for the 2.7.8a release, please download the most recent patch from GitHub master: https://github.com/alexdobin/STAR/archive/refs/heads/master.zip and try to compile it.

Cheers Alex

VincentGardeux commented 3 years ago

Hey Alex,

Sorry for the delayed answer. I compiled my 2.7.8a release, and indeed it seems to work now.

Thanks for the tip!

Cheers

alexdobin commented 3 years ago

Hi Vincent,

I would recommend switching to 2.7.9a and compiling. It includes the SIMDe software that adjusts the code for a given architecture. When compiling, You would need to specify your SIMD architecture with: make CXXFLAGS_SIMD=<arch>

Cheers Alex

VincentGardeux commented 3 years ago

Hi Alex,

I followed your recommendation and compiled the latest 2.7.9a with: make CXXFLAGS_SIMD=x86_64 But I guess the processor arch is not the architecture you are talking about, coz it stops with "No such file or directory" error.

I have absolutely no clue what is the SIMD architecture of my server, how can I know that? At some point, I was hoping that the compilation would detect and adapt automatically to my arch (since it was working directly without specifying anything on the previous version)

I also tried the same command as in the README markdown file:

make STAR CXXFLAGS_SIMD=sse

But I get the same error message:

g++ -c -I./ -std=c++11 -I/usr/include -O3 -std=c++11 -fopenmp -D'COMPILATION_TIME_PLACE="2021-05-11T12:21:30+0200 fameux:/data/software/STAR-2.7.9a/source"' -pipe -Wall -Wextra  sse opal.cpp
g++: error: sse: No such file or directory
make: *** [opal/opal.o] Error 1

So I'll stick with v.2.7.8a for now I guess...

Thanks

alexdobin commented 3 years ago

Hi Vincent,

sorry, there was a mistake in the README file, it should be: make STAR CXXFLAGS_SIMD="-msse4.2"

Finding which SIMD extensions your processor support is not as easy as it should be. You can look for sse and avx in cat /proc/cpuinfo | grep flags | head -n1 Or you can find the exact model of you processor and look it up on AMD or Intel websites.

Or you can simply use make STAR CXXFLAGS_SIMD="-march=native" and the compiler will figure it out - though it may not be the best (fastest) option, but it should be the safest.

Cheers Alex

VincentGardeux commented 3 years ago

Hi Alex,

Thanks for the info. I've found the sse4_2 flag when running the cpuinfo command, so I've run the compilation with the -msse4.2 flag This time the compilation went through with only warnings.

I tried running STAR alignReads with the --clipAdapterType CellRanger4 option and it also went through, without error.

I've also done a quick "safety check" of the output bam files, and they are perfectly identical to the ones I've got with the same command run on v.2.7.8a

So it seems all good to me from my side! Thanks for the help!

Cheers

piyushjo15 commented 3 years ago

Hi Alex,

I am running my STAR on HPC and I tried compiling STAR using your command make STAR CXXFLAGS_SIMD="-march=native", which worked as I was able to run STAR but the --clipAdapterType CellRanger4 didn't work for me.

Thanks, Piyush

hermidalc commented 2 years ago

Hi @alexdobin - Will this issue get fixed so that users don't need to recompile STAR to use --clipAdapterType CellRanger4? A lot of users use STAR as part of larger snakemake workflows for example, which rely on wrappers and automatic dependency install via conda/mamba.

alexdobin commented 2 years ago

Hi Leandro, I am not sure what the issue is. For some architectures, STAR needs to be compiled with different flags. I thought that automatic installers are supposed to pull the proper executables. I do not have the expertise/bandwidth to implement automatic detection of the architecture.

hermidalc commented 2 years ago

Hi Leandro,

I am not sure what the issue is.

For some architectures, STAR needs to be compiled with different flags.

I thought that automatic installers are supposed to pull the proper executables.

I do not have the expertise/bandwidth to implement automatic detection of the architecture.

Thanks for the response, but I think bioconda (and usually conda in general) only has different STAR packages per platform (Linux x64, Mac OSX, etc) not supported CPU extensions.

I'll have to see the it's even possible to make a pull request for the bioconda STAR packaging repo so that it compiles and makes different packages depending on CPU extensions.

pettyalex commented 1 year ago

Hi Leandro, I am not sure what the issue is. For some architectures, STAR needs to be compiled with different flags. I thought that automatic installers are supposed to pull the proper executables. I do not have the expertise/bandwidth to implement automatic detection of the architecture.

Thanks for the response, but I think bioconda (and usually conda in general) only has different STAR packages per platform (Linux x64, Mac OSX, etc) not supported CPU extensions.

I'll have to see the it's even possible to make a pull request for the bioconda STAR packaging repo so that it compiles and makes different packages depending on CPU extensions.

It sounds like this is specifically a packaging problem. If the goal for Bioconda is to maximize compatibility, they should do what Debian Med does for distributing STAR and target SSE2: https://salsa.debian.org/med-team/rna-star/-/blob/master/debian/patches/do-not-enforce-avx2.patch

If you want to run a binary specifically suited to different CPU families, the conda packagers could do an approach like https://github.com/bwa-mem2/bwa-mem2 where you build a star binary for each instruction set: SSE2, AVX2, AVX512 and then have an entry script to find and run the right one. This is all a conda packaging problem, not a problem with STAR though.

Lil-Psilocybe commented 10 months ago

Hello! Reviving this thread since I am having this issue on a HPC run by a bioinformatics core at my institution and they tried updating to the most recent version of STAR. Where exactly do we implement the commands make STAR CXXFLAGS_SIMD="-msse4.2" / make STAR CXXFLAGS_SIMD="-march=native" cat /proc/cpuinfo | grep flags | head -n1

Is this just on commandline when I login to the HPC or do these need to go in my star command?

pettyalex commented 10 months ago

@Lil-Psilocybe How did you install STAR, and what is the oldest CPU architecture that you expect to run STAR on?

Those are compile time environment variables to be set at the time that STAR is compiled from source code. If you installed STAR from bioconda, that version is not compatible with older CPUs and will not run on them.

Lil-Psilocybe commented 10 months ago

The installs are done by the bioinfo. core and they would have the arch info; I don't use a conda environment here either. I'll make them aware to this thread. I'll be in touch with this info, thanks!

pettyalex commented 10 months ago

You also may try only scheduling it to run on newer nodes, whether by targeting a specific set of nodes or limiting to a certain feature support in your scheduler. AVX2 has been present on all Intel Xeon processors since 2014, and all AMD Epyc processors since 2017, so only servers that are greater than 9 years old should encounter this problem.

If your nodes are virtualized I would speculate that some hypervisor configuration could break this as well, I've seen hypervisors not pass through all supported CPU flags into VMs but I'd expect the binary would still run because the support is actually there.

Lil-Psilocybe commented 10 months ago

We got it to run! Just recompiling and then running on Xeon proceesors got it to work, thanks for your detailed response though! I'll be back in case anything else comes up