alexdobin / STAR

RNA-seq aligner
MIT License
1.84k stars 505 forks source link

An issue with --soloMultiMappers EM ? and a question or 2. #1647

Open JohnUrban opened 2 years ago

JohnUrban commented 2 years ago

Hi,

A bit of this was addressed previously here: https://github.com/alexdobin/STAR/issues/1465

STAR --version
2.7.10a

I first tried this:

STAR  --runThreadN 8 --genomeDir ${GDIR} \
    --readFilesIn ${CDNA} ${CBUMI} --outSAMtype BAM SortedByCoordinate \
    --soloType CB_UMI_Simple --soloCBwhitelist ${WHITE} \
    --soloCBlen 16 --soloUMIlen 12 \
    --clipAdapterType CellRanger4 --outFilterScoreMin 30 \
    --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
    --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR  --outFileNamePrefix ${B}  \
        --soloMultiMappers EM 

I had to remove --clipAdapterType CellRanger4 as it was causing an error (62896 Illegal instruction: 4).

Running the above command without --clipAdapterType CellRanger4 works fine with 40,000 reads. I then tried it with 400,000 reads:

STAR --runThreadN 8 --genomeDir ${GDIR} --readFilesIn ${CDNA} ${CBUMI} --outSAMtype BAM SortedByCoordinate --soloType CB_UMI_Simple --soloCBwhitelist ${WHITE} --soloCBlen 16 --soloUMIlen 12 --soloMultiMappers EM --outFilterScoreMin 30 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --outFileNamePrefix ${B}

This gave the error 65440 Segmentation fault: 11.

This seemed similar to the issue I linked to above (https://github.com/alexdobin/STAR/issues/1465).

I removed --outSAMtype BAM SortedByCoordinate. It may or may not have gotten further, but then threw another error: 65737 Segmentation fault: 11 that I tracked down to --soloMultiMappers EM.

Removing --soloMultiMappers EM allows it to finish with the 400K reads. In fact, I can add --outSAMtype BAM SortedByCoordinate back to the command, and it also finishes successfully.

So I am guessing there is an issue with --soloMultiMappers EM.

Any thoughts or feedback would be appreciated. Otherwise, do feel free to close the issue. I just thought I'd bring it to your attention in case it is something worth investigating further.

Best,

John

JohnUrban commented 2 years ago

Update: the error is thrown when the --soloMultiMappers flag is used regardless of option (Uniform, PropUnique, EM, Rescue).

alexdobin commented 2 years ago

Hi John,

some seg-fault were fixed in this pre-release, please try it out: https://github.com/alexdobin/STAR/releases/tag/2.7.10a_alpha_220818

JohnUrban commented 2 years ago

Thanks Alex. Just getting to respond now.

Some additional info regarding the errors:

Ok - now on to the pre-release results on the remote Linux env.

... ... ...

The STAR pre-release 2.7.10a_alpha_220818 finished successfully. No errors. :)

Thanks for sharing the pre-release!

If you want me to try to get it installed on Mac OS, and test it there, I think I will need some tips on getting it installed. I already tried all the Mac-specific advice mentioned in the README or manual (I forget where I saw it). Otherwise, feel free to close this issue.

Thanks again.

JohnUrban commented 2 years ago

p.s. other possibly relevant info for the Mac

JohnUrban commented 2 years ago

As a final thought -- that I should probably bring up in a new issue -- using --soloMultiMappers EM doesn't seem to result in any differences in the final matrix.mtx files in the raw/ or filtered/ subdirs.

I do see "UniqueAndMult-EM.mtx" in the raw/ subdir, but not in the filtered/ subdir. Is it intentional that there is not a "Filtered" version of "UniqueAndMult-EM.mtx" ?

alexdobin commented 2 years ago

Hi John

the matrix.mtx in both raw/ and filtered/ contains only unique mappers so it's not affected by any multimapper options. You can do the filtering of the multimapper matrix like this: https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#cell-filtering-of-previously-generated-raw-matrix

I am hoping to make a Mac release of all the patches soon, but I keep getting reports about bugs. :(

hkevile commented 1 year ago

Hi @alexdobin

Thank you for all of your work on this! I wanted to follow up on this issue as I'm using STAR 2.7.10a_alpha_220818 in a remote Linux environment but am still getting a Segmentation fault error as soon as STARsolo starts counting. Similarly, everything runs fine when I remove the --soloMultiMappers parameter. Below is my command and the output I get.

STAR --genomeDir ${GDIR} --readFilesIn ${CDNA} ${CBUMI} --soloType CB_UMI_Simple \ 
--soloCBwhitelist ${WHITE} --soloUMIlen 12 --soloMultiMappers EM --readFilesCommand zcat \ 
--soloOutFileNames  ${B}

May 03 14:02:39 ..... started STAR run
May 03 14:02:41 ..... loading genome
May 03 14:03:26 ..... started mapping
May 04 07:56:21 ..... finished mapping
May 04 07:56:23 ..... started Solo counting
Segmentation fault

Please let me know what other information you need and thank you!

alexdobin commented 1 year ago

Hi @hkevile

please send me the Log.out file.

Cheers Alex