DavidsonGroup / flexiplex

The Flexible Demultiplexer
https://davidsongroup.github.io/flexiplex/
MIT License
23 stars 2 forks source link

Sequences used for 5' ONT #19

Closed mcortes-lopez closed 11 months ago

mcortes-lopez commented 1 year ago

Hi, thanks for this tool! It works nicely and fast. I was wondering what are the recommendations to demultiplex 5'v1 ONT data? I have a mixture of 5' and 3', where 2/7 of the samples are 3' data. When I test the suggested command: flexiplex -l TTGGTGCTGATATT -k "GCTTT" -r TTTGGGG -u 22 -f 3 -e 1 reads.fastq in a file of 4000 reads, I only got 2 reads, while 3' reads (searched in the default more) gave me 379. If I understand correctly the right sequence, should be the TSO and the left sequence the R1? I tested changing -l to TTTCTTATAT, based on this reference, and I got 167 reads.

nadiadavidson commented 1 year ago

Hi,

Is it single cell data? If you have some more information about the structure of 5'v1 ONT reads and/or are happy to send me some reads to 'look' at, we should be able to work it out. There seems to be a lot of variability between the different protocols. We've run it successfully on 5' 10x ONT data before using the TSO as the right sequence like you suggest.

Cheers, Nadia.

mcortes-lopez commented 1 year ago

Send. Thanks!

nadiadavidson commented 1 year ago

Hi,

Thanks for sending the fastq. I believe this set of options will work for your data: flexiplex -f 4 -r "" -u 12 -k < barcodes>

An explanation: -r "" this turns off the check for the right flanking sequencing, so that it can be either a polyT or the TSO. So you should be able to demultiplex both the 3' and 5' reads together since they share the same left flank (CTACACGACGCTCTTCCGATCT) -f 4 this just drops the edit distance to the flanking sequence since we've cut the amount of sequence down by making the right flank empty -u 12 In your data it looks like the 5' protocol uses 10bp UMIs whereas the 3' protocol uses 12bp. Setting this to 12 covers the larger case and just means you'll have 2 extra bases for the 5' reads which will be the first 2 from the TSO. Note that -u 12 is actually the default, so you could leave this option off if you wanted. -k barcodes Do you know the barcodes already? If so you should give it a list of both the 5' and 3' barcodes. If you don't know the barcodes then run flexiplex in discovery mode to get the list (set -f 0 to reduce noise), then follow the filtering instruction on the documentation website. flexiplex -f 0 -r ""

With these option I get roughly the number of reads demultiplexed that I would expect for 10x data (60-80%).

Good luck and hope it works!

Cheers, Nadia.