SorenKarst / longread_umi

GNU General Public License v3.0
76 stars 29 forks source link

Multiplexed reads structure #9

Closed MaestSi closed 4 years ago

MaestSi commented 4 years ago

Hi, congrats for the very nice work. As I saw there is a script named demultiplex_nb.sh, I would like to ask an info about multiplexing options. Based on the manuscript and the code, I saw the structure of your reads is the following: FW1-UMI-FW2-operon-RV2R-UMI-RV1R or its reverse complement. Based on the demultiplex_nb.sh script, if I understand correctly, it looks like the demultiplexing is performed as a final step, assigning UMI consensus sequences to samples, on the basis of raw reads demultiplexing. In this case, it looks like the extremity of a read should have the following structure (or its reverse complement): AATGTACTTCGTTCAGTTACGTATTGCTAAGGTTAA-nanopore barcode-CAGCACCT As none of the two sequences flanking the nanopore barcode are identical to one of the primers sequences described in the manuscript, I would like to ask you how does the structure of multiplexed reads look like. Does that sequence substitute FW1 primer sequence? Thanks in advance, Simone

SorenKarst commented 4 years ago

Hi MaestSi,

Thank you and nice job finding the demultiplex_nb.sh. It is a temporary demultiplexing script that is designed to work with Nanopore Adaptors that are ligated onto the UMI amplicons (PCR barcoding amplicons (SQK-LSK109)). So the terminal structure would look something like: AATGTACTTCGTTCAGTTACGTATTGCTAAGGTTAA-nanopore barcode-CAGCACCT...?-FW1-UMI-FW2-operon

I can't remember whether the CAGCACCT sequence is truncated or the complete sequence, but for the script to work it doesn't matter. Be aware that R10 flow cells truncate the read terminals, which means some reads might not have barcodes.

Hope this helps.

MaestSi commented 4 years ago

Thank you very much for the kind information. Simone

MaestSi commented 4 years ago

Hi, just a follow-up question based on the reading of the online protocols. Is that correct that barcoding is performed during the II PCR? If yes, could you please share the exact structure of the primer used in the II PCR? Based on our understanding, said structure should look like this:

AAGGTTAA-nanopore barcode-CAGCACCT-FW1...

If that's the case, you would need to synthesise as many II PCR primers as the number of samples (i.e. barcodes) you intend to pool together. Is that correct?

Is there a specific reason why you did not choose to use the primer tails provided by ONT in place of FW1? Thanks again, Simone

MaestSi commented 4 years ago

Hi, I think I just spotted a small mistake in the manuscript v3. At line 149, the reported sequences are -G $RV1R and -G $FW1R, while I think the correct ones are those reported in the umi_binning.sh script at line 225 -G $RV2R -G $FW2R. Is it correct?

SorenKarst commented 4 years ago

Hi MaestSi,

The demultiplexing scripts have changed a lot since this issue was submitted. Do you still experience problems?

MaestSi commented 4 years ago

No, thanks. In case, I'll let you know. Simone