alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

STARsolo : low number of distinct CB detected based in-drop platform #711

Open piloter2 opened 5 years ago

piloter2 commented 5 years ago

Hi, alex. Thanks for developing the awesome tool, STAR. We're trying to execute STAR with our existing single cell RNA-seq. data based on In-Drop platform for better performance. As you mentioned previously, the STARsolo with version 2.7.2x_0723_soloComplexBarcodes was utilized with the parameters (https://github.com/alexdobin/STAR/issues/605) However, we found that the number of distinct detected barcodes was 33. The different aligner confirmed that its CBs were more than 2,000. I do know how to solve it. I'm attaching the final.out file as following.

Started job on |    Aug 08 08:57:10
                         Started mapping on |   Aug 08 09:42:40
                                Finished on |   Aug 08 11:25:45
   Mapping speed, Million of reads per hour |   266.82
                      Number of input reads |   458408350
                  Average input read length |   76
                                UNIQUE READS:
               Uniquely mapped reads number |   339737830
                    Uniquely mapped reads % |   74.11%
                      Average mapped length |   74.43
                   Number of splices: Total |   19214293
        Number of splices: Annotated (sjdb) |   18523988
                   Number of splices: GT/AG |   16081206
                   Number of splices: GC/AG |   534862
                   Number of splices: AT/AC |   179383
           Number of splices: Non-canonical |   2418842
                  Mismatch rate per base, % |   0.81%
                     Deletion rate per base |   0.02%
                    Deletion average length |   1.45
                    Insertion rate per base |   0.03%
                   Insertion average length |   1.45
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   27536485
         % of reads mapped to multiple loci |   6.01%
    Number of reads mapped to too many loci |   998110
         % of reads mapped to too many loci |   0.22%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 89585576 % of reads unmapped: too short | 19.54% Number of reads unmapped: other | 550349 % of reads unmapped: other | 0.12% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%

Here are the outputs when running samtools flagstat XXX.bam that the STARsolo had made :

517168891 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 426034856 + 0 mapped (82.38%:-nan%) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (-nan%:-nan%) 0 + 0 with itself and mate mapped 0 + 0 singletons (-nan%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

Many thanks, JayM

alexdobin commented 5 years ago

Hi JayM,

could you please post the Solo.out/Gene.stats file?

Cheers Alex

piloter2 commented 5 years ago

Thanks,

Here is the Solo.out/Gene.stats.

                                    Barcodes:
                                    nNoAdapter       70455494
                                        nNoUMI           1066
                                         nNoCB        1890906
                                        nNinCB           1732
                                       nNinUMI         103288
                               nUMIhomopolymer         902136
                                      nTooMany         119141
                                      nNoMatch      383355624
                           nMismatchesInMultCB        1570438
                                          Gene:
                                     nUnmapped           1862
                                    nNoFeature           3189
                                 nAmbigFeature            138
                         nAmbigFeatureMultimap            105
                                      nTooMany              0
                                 nNoExactMatch           2013
                                   nExactMatch             16
                                        nMatch           1323
                                 nCellBarcodes             33
                                         nUMIs            451
alexdobin commented 5 years ago

Hi JayM,

It seems like the barcodes were not recognized. could you please send me the Log.out file, the barcode files, and ~1000 good (no Ns, better to cut them from the middle of the files) reads (read1 and read2).

Cheers Alex

piloter2 commented 5 years ago

Hi Alex, I'm sorry for late reply. I have just emailed you with the files. But.... Could you give me another email address?

Address not found

Your message wasn't delivered to dobin@csh.edu because the domain csh.edu couldn't be found. Check for typos or unnecessary spaces and try again.

alexdobin commented 5 years ago

Hi JayM

it's dobin@cshl.edu

Thanks! Alex