Closed colindaven closed 6 years ago
Hello @colindaven Your R1 should be consistent in terms of length. It's either full (meaning 20?) or nothing. The reason for that is the way cell barcode and UMI are captured.
Basically, the index you give in the config file will be used to get the cell and UMI barcode. If part of the cell + UMI is missing, it will crash.
I would delete all pair of reads that have a non full length R1. You could also not filter at all since this is taken care of later in the pipeline for R2.
Do you need any other information?
Hi @Hoohm
that's really helpful. Thanks. I have lots of other seq lengths in my R1 actually, from 1-20 bp.
For others having this problem, here's a solution to filter by length then repair the pairs:
filter to length 20bp
seqtk seq -L 20 A1_03_S2_R1.fastq.gz > A1_03_S2_2_R1.fastq &
repair the read pairs. repair.sh - from bbmap
repair.sh -Xmx400g in=A1_03_S2_2_R1.fastq in2=A1_03_S2_2_R2.fastq out1=A1_03_S2_2b_R1.fastq out2=A1_03_S2_2b_R2.fastq outs=singletons1.fq overwrite=true
Hi,
this looks to me like I am trying to pick up a cell barcode of 12bp, but the read is shorter than 12bp (possibly due to quality trimming?). At least I see very short reads in the
unmapped.bam
file.Does it make sense for drop-seq-pipe to check the length of the sequence to be tagged and avoid any errors with a try catch block, and further to exclude these sequences ?
We're talking a NextSEq 2x75bp run with R1 of 18-20bp and R2 of 60-62 bp.
Of course, I may have interpreted this error wrongly, I'm new to this.
cheers