Closed mcortes-lopez closed 11 months ago
Hi,
Is it single cell data? If you have some more information about the structure of 5'v1 ONT reads and/or are happy to send me some reads to 'look' at, we should be able to work it out. There seems to be a lot of variability between the different protocols. We've run it successfully on 5' 10x ONT data before using the TSO as the right sequence like you suggest.
Cheers, Nadia.
Send. Thanks!
Hi,
Thanks for sending the fastq. I believe this set of options will work for your data:
flexiplex -f 4 -r "" -u 12 -k < barcodes>
An explanation:
-r "" this turns off the check for the right flanking sequencing, so that it can be either a polyT or the TSO. So you should be able to demultiplex both the 3' and 5' reads together since they share the same left flank (CTACACGACGCTCTTCCGATCT)
-f 4 this just drops the edit distance to the flanking sequence since we've cut the amount of sequence down by making the right flank empty
-u 12 In your data it looks like the 5' protocol uses 10bp UMIs whereas the 3' protocol uses 12bp. Setting this to 12 covers the larger case and just means you'll have 2 extra bases for the 5' reads which will be the first 2 from the TSO. Note that -u 12 is actually the default, so you could leave this option off if you wanted.
-k barcodes Do you know the barcodes already? If so you should give it a list of both the 5' and 3' barcodes. If you don't know the barcodes then run flexiplex in discovery mode to get the list (set -f 0 to reduce noise), then follow the filtering instruction on the documentation website. flexiplex -f 0 -r ""
With these option I get roughly the number of reads demultiplexed that I would expect for 10x data (60-80%).
Good luck and hope it works!
Cheers, Nadia.
Hi, thanks for this tool! It works nicely and fast. I was wondering what are the recommendations to demultiplex 5'v1 ONT data? I have a mixture of 5' and 3', where 2/7 of the samples are 3' data. When I test the suggested command:
flexiplex -l TTGGTGCTGATATT -k "GCTTT" -r TTTGGGG -u 22 -f 3 -e 1 reads.fastq
in a file of 4000 reads, I only got 2 reads, while 3' reads (searched in the default more) gave me 379. If I understand correctly the right sequence, should be the TSO and the left sequence the R1? I tested changing-l
to TTTCTTATAT, based on this reference, and I got 167 reads.