I'm trying to conduct a mapping of my 10x scRNA-seq data with a customized reference genome using STARsolo.
My analysis seemed to work properly ("finished successfully" message appeared), but when I checked the output Summary.csv file, I found many problems as follows.
Number of Reads 342455698
Reads With Valid Barcodes 0
Sequencing Saturation nanQ30 Bases in CB+UMI 0.931662Q30
Bases in RNA read 0.914494
Reads Mapped to Genome: Unique+Multiple 0.924652
Reads Mapped to Genome: Unique 0.749886
Reads Mapped to Transcriptome: Unique+Multipe Genes 0
Reads Mapped to Transcriptome: Unique Genes 0
Estimated Number of Cells 0
Reads in Cells Mapped to Unique Genes 0
Fraction of Reads in Cells nanMean Reads per Cell 0
Median Reads per Cell 0
UMIs in Cells 0
Mean UMI per Cell 0
Median UMI per Cell 0
Mean Genes per Cell 0
Median Genes per Cell 0
Total Genes Detected 0
As for "Barcodes", barcodes.tsv file with content was generated in output "raw" folder, but it was empty in "filtered" folder.
I suppose this is not a true result because when my collaborator previously analyzed same data with CellRanger, there were no problems with sequencing quality.
(I need to re-analyze my data in my hands because I want to use customized reference genome.)
The command I executed to run STARsolo was following;
STAR --runThreadN 16 --genomeDir STAR_reference --readFilesIn Fastq/sample1_GEX/sample1_GEX_S3_L003_R2_001.fastq.gz Fastq/sample1_GEX/sample1_GEX_S3_L003_R1_001.fastq.gz --soloType Droplet --soloCBwhitelist Fastq/737K-august-2016.txt --soloUMIlen 12 --soloCBlen 16 --outFileNamePrefix sample1_ --readFilesCommand gzcat --soloBarcodeReadLength 0
I would really appreciate it if someone could give me some advice.
Sincerely,
After executing this command, I could get following output;
Number of Reads 342455698
Reads With Valid Barcodes 0.903744
Sequencing Saturation 0.749502
Q30 Bases in CB+UMI 0.964867
Q30 Bases in RNA read 0.914494
Reads Mapped to Genome: Unique+Multiple 0.922693
Reads Mapped to Genome: Unique 0.646587
Reads Mapped to Transcriptome: Unique+Multipe Genes 0.0693924
Reads Mapped to Transcriptome: Unique Genes 0.0535003
Estimated Number of Cells 3857
Reads in Cells Mapped to Unique Genes 16633083
Fraction of Reads in Cells 0.907846
Mean Reads per Cell 4312
Median Reads per Cell 3533
UMIs in Cells 4194728
Mean UMI per Cell 1087
Median UMI per Cell 896
Mean Genes per Cell 493
Median Genes per Cell 422
Total Genes Detected 22030
The value of "Reads With Valid Barcodes" seems proper in this time, but I suppose the value of "Reads Mapped to Transcriptome: Unique+Multipe Genes" and "Reads Mapped to Transcriptome: Unique Genes" seem strange.....
I am sorry for asking so many questions, but I would be very grateful for any advice you could give me.
I'm very sorry for the late reply.
Thank you very much for your advice!
I added this --soloStrand Reverse command based on your advice and the analysis worked!
Hi @jfoedfjwofa
This is likely an issue with the barcode whitelist - please check that you are using the correct one for this library.
Thank you very much for your prompt response.
My scRNA-seq samples are 10x 5' GEX v2 Sample, so I think the whitelist I used (737K-august-2016.txt) was correct...
After reading through the STARsolo tutorial (https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md) again, I have corrected the script as follows.
STAR --runThreadN 16 --genomeDir STARreference --soloCBwhitelist Fastq/737K-august-2016.txt --outFileNamePrefix sample1 --readFilesCommand gzcat --soloBarcodeReadLength 1 --clip5pNbases 39 0 --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 10 --readFilesIn Fastq/sample1_GEX/sample1_GEX_S3_L003_R2_001.fastq.gz Fastq/sample1_GEX/sample1_GEX_S3_L003_R1_001.fastq.gz
After executing this command, I could get following output;
Number of Reads 342455698 Reads With Valid Barcodes 0.903744 Sequencing Saturation 0.749502 Q30 Bases in CB+UMI 0.964867 Q30 Bases in RNA read 0.914494 Reads Mapped to Genome: Unique+Multiple 0.922693 Reads Mapped to Genome: Unique 0.646587 Reads Mapped to Transcriptome: Unique+Multipe Genes 0.0693924 Reads Mapped to Transcriptome: Unique Genes 0.0535003 Estimated Number of Cells 3857 Reads in Cells Mapped to Unique Genes 16633083 Fraction of Reads in Cells 0.907846 Mean Reads per Cell 4312 Median Reads per Cell 3533 UMIs in Cells 4194728 Mean UMI per Cell 1087 Median UMI per Cell 896 Mean Genes per Cell 493 Median Genes per Cell 422 Total Genes Detected 22030
The value of "Reads With Valid Barcodes" seems proper in this time, but I suppose the value of "Reads Mapped to Transcriptome: Unique+Multipe Genes" and "Reads Mapped to Transcriptome: Unique Genes" seem strange..... I am sorry for asking so many questions, but I would be very grateful for any advice you could give me.
Best regards,
This could be an issue with strandedness, please try
--soloStrand Reverse
I'm very sorry for the late reply. Thank you very much for your advice! I added this --soloStrand Reverse command based on your advice and the analysis worked!
I'm deeply thankful for your advice.