Closed abhisheksinghnl closed 2 years ago
Hi @abhisheksinghnl
in SmartSeq-like protocols, each pair of Read1/2 FASTQ files corresponds to one cell. Since you supplied only one pair, you get one cell in the output.
Cheers Alex
Hi @alexdobin,
Thank you for your reply. These are pretty huge files in the range of 22 to 25Gbs.
Here is a snapshot of one of the file in question.
R1
@A00877:307:H5H77DSXY:2:1101:1325:1000 1:N:0:TGACCAAT
GNAGGGAGACGTCTACATCTGCCAAGTGGAGCACACCAGCCTGGACAGTCCTGTCACCGTGGAGTGGAAGGCACAGTCTGATTCTGCCCGGAGTAAGACATTGACGGGAGCTGGGGGCTTCATGCTGGGGCTCATCATCTGTGGAGTGGG
+
F#FFFF:FFFFFFFFFFFFFFFF:FFFFFFFFFF,FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF,:FFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFF:F:F
@A00877:307:H5H77DSXY:2:1101:1380:1000 1:N:0:TGACCAAT
TNCTAGCAGCATTGGCCTTGGCAAGTCACTGGTAACTGTTTTCTGTAAAGCAGAGGTTGCCCACTTCATTAGACTGTAAGAACTGAATGAGAAAAGAGTAGGAGAGTACTCTGTAAACACAAGTGATAGGGAAGTTACCATCACCACTCC
+
F#FF:FFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFF
@A00877:307:H5H77DSXY:2:1101:1524:1000 1:N:0:TGACCAAT
GNGATGGGTCTTGCTATGTTGCACAGGCTGGTCTTGAACTCCTGGATTTAAGTGATCTTTTTACTCTAAAATGTAATCTAAATAATAACAAAATAAATATTGAGCTGAAGAAAAGAAAAAGGAAACAGTGATTTTCATGTCTGCTATGTG
R2
@A00877:307:H5H77DSXY:2:1101:1325:1000 2:N:0:TGACCAAT
TGGCTTCATGCAGGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTGGTGTTTTTTATTTTTTGTGGTAAAAATAAGGGAAAAAGGTTGTAGTCAAAGTGTTAGTTAAAGTGGATTGATAAAAAAAGCAAAAATTTATAAAATAAGATAATAG
+
FFFFFFFFFFFFFFF:F,F:,FF::FFF,F:FFF:FF:F:FF,F:F,FFF::F:FF::F,FFF,,,,,F,,FF,:F,F,,FF,:::,FF,F,,F,:,,,:,,,F,,,F,,,FF,,,FF:F:,F,FF,,,,,,:F,:,FF,:,,,,,,,:,
@A00877:307:H5H77DSXY:2:1101:1380:1000 2:N:0:TGACCAAT
GATAGACAGTGACGCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTAAAGTTTTGGGGTTGGTAACTAACATAAAAATTGTTTTTAAATTGTAATAAACAAAAATTTTAAAATAAAATAATTACATTAAAATTAATGTGCCAACCAATGGTT
+
FFFFFFFFFFFFFFFFF:FFFFF,FFFF:,FFFFFFFFF:FFF,,F,,,,,,:F,,,,,,,F:,:,,,,,F,,,,:F:,FFF:,,,,:,,,,,,F,F,,,,,F,FF,,F,:,,,:,,,::,,,:,FF,F,,,,,,F:,:F,,,,,,,,,F
@A00877:307:H5H77DSXY:2:1101:1524:1000 2:N:0:TGACCAAT
GGACTATATTTACATTGTGGCTTTGCCATTTTCTAGATTTTTTTTACTTTGGACAAATTATTTAAACTCTTTGAACCTCATTGTTCTCATCTGTGCAGATGATGCTCACTTCAGAGAAGATGACGCACATACAACACATTAAACCTAGTG
If you could confirm, whether I am missing something or not.
Best regards Abhishek
Hi Abhishek,
if the files are so large, it's possible that they have not been demultiplexed by cell. There is the 8b index sequence at the end of the read name, which may correspond to different cells, then you could split the files by that index. But you would need to confirm it with people who sequenced this library.
Cheers Alex
Hi @alexdobin ,
thank you again for your reply it does need to de multiplexed. I am new to this SmartSeq platform and I have no idea how can I do it. Could you please point me to the direction of a thread or a small chunk of code that could do this for me?
Thank you
Hi Abhishek,
usually, the standard Illumina pipeline does the demultiplexing. If you have to do it yourself, you would need to have index-to-cell correspondence table. Note that some index sequences contain errors, and so should be discarded. So at the very least, you need to know the list of indexes used in the library generation.
If each index corresponds to just one cell, it's a relatively simple parsing exercise: each read (4 lines) is output in the file named according to the index sequence. With awk, it would be something like this:
awk '{ file1=substr($2,7) "_R1.fq"; print > file1; for (ii=2;ii<=4;ii++) {getline; print > file1} }' Read1.fq
and similar for Read2.
Cheers Alex
Thank you soo much
Dear Community,
I am trying to run STARsolo on my SmartSeq2 samples. I am using following command
STAR --runThreadN 16 --genomeDir /home/jupyter/HumanGenomes/ --readFilesCommand zcat --readFilesIn HRR204645_f1.fq.gz HRR204645_r2.fq.gz --soloType SmartSeq --outSAMtype BAM Unsorted --outBAMcompression -1 --soloUMIdedup Exact --outSAMattrRGline ID:sample1
The process doesn't show up any error. However, summary.csv file highlights that I have only 1 cell.
"Estimated Number of cell 1"
I might be missing something but I am not sure what I am missing.
Could anyone suggest something or help.
Thank you