cellgeni / STARsolo

wrapper scripts for convenient STARsolo processing of 10X and other scRNA-seq
GNU General Public License v3.0
44 stars 5 forks source link

Serious help needed; Solo generates the empty matrix after completion without an error. #3

Closed akhst7 closed 1 year ago

akhst7 commented 1 year ago

Hi, I am trying to get Solo going with a one of 10x dataset with V3 chemistry, 3p_Citrate_CPT_fastqs (https://www.10xgenomics.com/resources/datasets/pbmcs-3p_citrate_cpt-3-1-standard) and have not be able to get the matrix even though Solo runs to completion without an error. I can't figure out what is wrong with my setup. I slightly modified your script to accommodate M1 Mac running OSX 13.2 as follows;

#!/bin/bash

index=/Volumes/Bioinformatics/star_genome_index/index_human
whitelist=/Volumes/Bioinformatics/star_genome_index/3M-february-2018.txt

STAR --genomeDir $index  \
--readFilesIn L002_R2_001.fastq L002_R1_001.fastq \
--soloCBwhitelist $whitelist \
--soloType CB_UMI_Simple \
--soloCBstart 1 \
--soloCBlen 16 \
--soloUMIstart 17 \
--soloUMIlen 12 \
--soloStrand Forward \
--twopassMode Basic \
--clipAdapterType CellRanger4 \
--soloMultiMappers EM \
--soloCellFilter EmptyDrops_CR \
--soloFeatures Gene GeneFull SJ Velocyto \
--outFilterScoreMin 30 \
--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
--soloUMIfiltering MultiGeneUMI_CR \
--soloUMIdedup 1MM_CR \
--outFileNamePrefix AICD_PBMC \
--outSAMtype None

Output of a final log is following;

more AICD_PBMCLog.final.out
                                 Started job on |       Feb 01 15:20:55
                             Started mapping on |       Feb 01 15:27:42
                                    Finished on |       Feb 01 15:38:30
       Mapping speed, Million of reads per hour |       0.00

                          Number of input reads |       0
                      Average input read length |       0
                                    UNIQUE READS:
                   Uniquely mapped reads number |       0
                        Uniquely mapped reads % |       0.00%
                          Average mapped length |       0.00
                       Number of splices: Total |       0
            Number of splices: Annotated (sjdb) |       0
                       Number of splices: GT/AG |       0
                       Number of splices: GC/AG |       0
                       Number of splices: AT/AC |       0
               Number of splices: Non-canonical |       0
                      Mismatch rate per base, % |       nan%
                         Deletion rate per base |       0.00%
                        Deletion average length |       0.00
                        Insertion rate per base |       0.00%
                       Insertion average length |       0.00
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       0
             % of reads mapped to multiple loci |       0.00%
        Number of reads mapped to too many loci |       0
             % of reads mapped to too many loci |       0.00%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       0
                 % of reads unmapped: too short |       0.00%
                Number of reads unmapped: other |       0
                     % of reads unmapped: other |       0.00%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

I generated a STAR human index as follows;

parallel -j0 -vv --eta "STAR --runMode genomeGenerate --runThreadN 20 --genomeDir ~/Volumes/Bioinformatics/star_genome_index/index_human --genomeFastaFiles {1} --sjdbGTFfile {2}" ::: /Volumes/Bioinformatics/refdata-gex-GRCh38-2020-A/fasta/genome.fa ::: /Volumes/Bioinformatics/refdata-gex-GRCh38-2020-A/genes/genes.gtf I posted this question in a few help resources but have not gotten any responses yet. I would really appreciate if you could give me any pointers to the solution.

apredeus commented 1 year ago

Hi. This is probably not the best place to ask, but I might as well help you. Why did you run the genomeGenerate command with parallel? It's not the kind of task that can be parallelised like that. So I'm pretty sure your STAR index wasn't created successfully - actual STARsolo command looks OK.

PS you'll need at least 32 Gb RAM and 4-8 cores.

akhst7 commented 1 year ago

@apredeus Thanks for help. I am beginning to suspect that parallel is the issue. I am using M1 MacStudio with 20 arm cores and 128GB RAM. It run Kallisto-Bustools, and Alvin, and Star for bulk RNAseq with no issues for not extremely large data set.
I've been playing with gnu parallel since --runThreadN is not supported yet in the aarch/arm Mac.

akhst7 commented 1 year ago

It turns out that the issue was not with parallel. I was using brew installed Star (stable 2.7.10a+220818) which was a crap. I installed the latest, 2.7.10b_alpha_220111 and worked.