alexdobin / STAR

RNA-seq aligner
MIT License
1.83k stars 504 forks source link

Reads With Valid Barcodes=zero in STARsolo analysis #2111

Open jfoedfjwofa opened 5 months ago

jfoedfjwofa commented 5 months ago
Hello everyone,

I'm trying to conduct a mapping of my 10x scRNA-seq data with a customized reference genome using STARsolo.

My analysis seemed to work properly ("finished successfully" message appeared),  but when I checked the output Summary.csv file, I found many problems as follows.

Number of Reads 342455698 Reads With Valid Barcodes 0 Sequencing Saturation nanQ30 Bases in CB+UMI 0.931662Q30 Bases in RNA read 0.914494 Reads Mapped to Genome: Unique+Multiple 0.924652 Reads Mapped to Genome: Unique 0.749886 Reads Mapped to Transcriptome: Unique+Multipe Genes 0 Reads Mapped to Transcriptome: Unique Genes 0 Estimated Number of Cells 0 Reads in Cells Mapped to Unique Genes 0 Fraction of Reads in Cells nanMean Reads per Cell 0 Median Reads per Cell 0 UMIs in Cells 0 Mean UMI per Cell 0 Median UMI per Cell 0 Mean Genes per Cell 0 Median Genes per Cell 0 Total Genes Detected 0   As for "Barcodes", barcodes.tsv file with content was generated in output "raw" folder, but it was empty in "filtered" folder. I suppose this is not a true result because when my collaborator previously analyzed same data with CellRanger, there were no problems with sequencing quality. (I need to re-analyze my data in my hands because I want to use customized reference genome.) The command I executed to run STARsolo was following; STAR --runThreadN 16 --genomeDir STAR_reference --readFilesIn Fastq/sample1_GEX/sample1_GEX_S3_L003_R2_001.fastq.gz Fastq/sample1_GEX/sample1_GEX_S3_L003_R1_001.fastq.gz --soloType Droplet --soloCBwhitelist Fastq/737K-august-2016.txt --soloUMIlen 12 --soloCBlen 16 --outFileNamePrefix sample1_ --readFilesCommand gzcat --soloBarcodeReadLength 0 I would really appreciate it if someone could give me some advice. Sincerely,
alexdobin commented 5 months ago

Hi @jfoedfjwofa

This is likely an issue with the barcode whitelist - please check that you are using the correct one for this library.

jfoedfjwofa commented 5 months ago

Thank you very much for your prompt response.

My scRNA-seq samples are 10x 5' GEX v2 Sample, so I think the whitelist I used (737K-august-2016.txt) was correct...

After reading through the STARsolo tutorial (https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md) again, I have corrected the script as follows.

 STAR --runThreadN 16 --genomeDir STARreference  --soloCBwhitelist Fastq/737K-august-2016.txt --outFileNamePrefix sample1 --readFilesCommand gzcat --soloBarcodeReadLength 1  --clip5pNbases 39 0 --soloType CB_UMI_Simple   --soloCBstart 1   --soloCBlen 16   --soloUMIstart 17   --soloUMIlen 10 --readFilesIn Fastq/sample1_GEX/sample1_GEX_S3_L003_R2_001.fastq.gz Fastq/sample1_GEX/sample1_GEX_S3_L003_R1_001.fastq.gz

After executing this command, I could get following output;

Number of Reads 342455698 Reads With Valid Barcodes 0.903744 Sequencing Saturation 0.749502 Q30 Bases in CB+UMI 0.964867 Q30 Bases in RNA read 0.914494 Reads Mapped to Genome: Unique+Multiple 0.922693 Reads Mapped to Genome: Unique 0.646587 Reads Mapped to Transcriptome: Unique+Multipe Genes 0.0693924 Reads Mapped to Transcriptome: Unique Genes 0.0535003 Estimated Number of Cells 3857 Reads in Cells Mapped to Unique Genes 16633083 Fraction of Reads in Cells 0.907846 Mean Reads per Cell 4312 Median Reads per Cell 3533 UMIs in Cells 4194728 Mean UMI per Cell 1087 Median UMI per Cell 896 Mean Genes per Cell 493 Median Genes per Cell 422 Total Genes Detected 22030

The value of "Reads With Valid Barcodes" seems proper in this time, but I suppose the value of "Reads Mapped to Transcriptome: Unique+Multipe Genes" and  "Reads Mapped to Transcriptome: Unique Genes" seem strange..... I am sorry for asking so many questions, but I would be very grateful for any advice you could give me.

Best regards,

alexdobin commented 5 months ago

This could be an issue with strandedness, please try --soloStrand Reverse

jfoedfjwofa commented 4 months ago

I'm very sorry for the late reply. Thank you very much for your advice! I added this --soloStrand Reverse command based on your advice and the analysis worked!

I'm deeply thankful for your advice.