Segmentation error issue when running starsolo

niradsp commented 3 years ago

Hello Alex, So this is supposed to be V2 5' assay. So I am wondering if the parameters are correct. Is the whitelist different for 5' assay? I am wondering if that is causing an issue.

Here is the command I used: STAR --runThreadN 10 --genomeDir ref/ --readFilesIn MQ1_CH_B1_S2_L001_R2_001_fastq.gz MQ1_CH_B1_S2_L001_R1_001_fastq.gz --readFilesCommand zcat --outSAMtype BAM Unsorted --soloType Droplet --soloCBwhitelist 737K-august-2016.txt --outFileNamePrefix MQ1_CH_B1_S2_L001 --clipAdapterType CellRanger4 --outFilterScoreMin 0 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --outFilterScoreMinOverLread 0 --outFilterMismatchNoverLmax 0.05 --alignIntronMax 1 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 0

I have 8 files. 5 of them seem to run fine, but 3 of them are giving me segfault error message. The log file stops here:

Genome: size given as a parameter = 3210148988 SA: size given as a parameter = 24892337806 SAindex: size given as a parameter = 1 Read from SAindex: pGe.gSAindexNbases=14 nSAi=357913940 nGenome=3210148988; nSAbyte=24892337806 GstrandBit=32 SA number of indices=6034506134 Shared memory is not used for genomes. Allocated a private copy of the genome. Genome file size: 3210148988 bytes; state: good=1 eof=0 fail=0 bad=0 Loading Genome ... done! state: good=1 eof=0 fail=0 bad=0; loaded 3210148988 bytes SA file size: 24892337806 bytes; state: good=1 eof=0 fail=0 bad=0 Loading SA ... done! state: good=1 eof=0 fail=0 bad=0; loaded 24892337806 bytes Loading SAindex ... done: 1565873619 bytes Finished loading the genome: Tue Mar 16 17:56:59 2021

Processing splice junctions database sjdbN=357020, pGe.sjdbOverhang=100 To accommodate alignIntronMax=1 redefined winBinNbits=16 To accommodate alignIntronMax=1 and alignMatesGapMax=0, redefined winFlankNbins=1 and winAnchorDistNbins=2 Loaded transcript database, nTr=203742 Loaded exon database, nEx=1237303 Created thread # 1 Created thread # 2 Created thread # 3 Starting to map file # 0 mate 1: MQ1_CH_B1_S2_L001_R2_001_fastq.gz mate 2: MQ1_CH_B1_S2_L001_R1_001_fastq.gz Created thread # 4 Created thread # 5 Created thread # 6 Created thread # 7 Created thread # 8 Created thread # 9

There is nothing after thread #9.

Also looking at the STDOUT. It seems to stop at "started mapping".

Mar 16 17:56:42 ..... started STAR run Mar 16 17:56:42 ..... loading genome Mar 16 17:57:01 ..... started mapping

For the good files, I also see the following lines:

Mar 16 18:06:11 ..... started Solo counting Mar 16 18:06:25 ..... finished Solo counting Mar 16 18:06:25 ..... finished successfully

Looking at the Progress log. The unique mapping was >80%.

The tmp files are there.

Thanks in advance, Nirad

alexdobin commented 3 years ago

Hi Nirad,

--clipAdapterType CellRanger4 will likely not work with 5' data, and it may cause seg-faults - please try to run without it.

Also, I strongly recommend against completely removing mapping filters:

--outFilterScoreMin 0 --soloUMIdedup 1MM_CR --outFilterScoreMinOverLread 0 --alignIntronMax 1 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 0

as it allows very poor quality alignments.

Cheers Alex

niradsp commented 3 years ago

Hello Alex, What setting would you recommend for finding microRNA or small RNA? That's the reason I had the filters set to low numbers.

Thanks, Nirad

alexdobin commented 3 years ago

Hi Nirad,

for small RNA, it's important to trim the 3' adapter from cDNA read (no trimming for barcode read). After that, the default filtering should work, though I would recommend making it more (not less) stringent with ``--outFilterMatchNminOverLread 0.8 .

Cheers Alex

niradsp commented 3 years ago

Hello Alex, Thank you. We have been testing the starsolo results vs the cellranger results. We are getting very different results. The umap looks very different. Here are my parameters. Anything I am doing wrong here?

STAR --runThreadN 20 --genomeDir ref/ --readFilesIn $R2 $R1 --readFilesCommand zcat --outSAMtype BAM Unsorted --soloType CB_UMI_Simple --soloCBwhitelist 737K-august-2016.txt --outFileNamePrefix $truncated --outFilterScoreMin 30 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR

alexdobin commented 3 years ago

Hi Nirad,

if you are using the "filtered" cells from STAR, you would need: --soloCellFilter EmptyDrops_CR Also, if you are comparing to CR4 or later, you would need --clipAdapterType CellRanger4 For the best agreement, you also need to use sparse genome at the genome generation step: --genomeSAsparseD 3

Cheers Alex

niradsp commented 3 years ago

Hello Alex, I think you mentioned that the --clipadaptertype cellranger4 does not work with 5' assay. I am using the latest version. Do the other two parameters work?

Thanks in advance, Nirad

alexdobin commented 3 years ago

Right, sorry. The other parameters should work with 5' assays. I have not checked how good the agreement with CR is for the 5' - please let me know how that works out.

niradsp commented 3 years ago

Hello Alex, I tried those parameters. The UMAP that we are getting with these parameters is similar to the one I was getting with the parameters I used above. Cellranger is producing a different type of UMAP. Maybe this has to do with 5' assay?

Thanks, Nirad

alexdobin commented 3 years ago

Hi Nirad,

UMAPs are very sensitive to "initial conditions". I would compare first just the gene expression levels, e.g. you can calculate Spearman for gene counts between the two tools for the same cell , excluding genes that are 0 for both tools. It is possible that CellRanger processing for the 5' assay is significantly different, I will need to check it carefully. Are you using exactly the same annotations for CellRanger and STARsolo?

Cheers Alex

alexdobin / STAR

Segmentation error issue when running starsolo #1176