dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
70 stars 39 forks source link

combinatorial demultiplexing problem #546

Closed TomaszSuchan closed 4 months ago

TomaszSuchan commented 4 months ago

I'm having an issue with demultiplexing libraries using combinbatrial inline barcodes (ipyrad 0.9.93). The barcode from the first read is removed with no issues but the barcode from R2 stays in the reads and ends up in the assembly and final output files.

Here is an example:

Barcode file:

Carper_1-1  ACCTG   ACCTG
Carper_1-2  CTCAG   CTCAG
Carper_1-3  CGCTA   CGCTA
...

Example read after demultiplexing from the 1st sample, R2:

@LH00235:121:22GW3NLT3:2:1101:27462:1224 2:N:0:ATTCAGAA+CTTAGCCT
ACCTGTAAAAGCGTTATTATGTTACATTTTAAGATATTCAAGGACAATATGGAATTCATAGAATCATTCAACAGTGGTGAAAATCATCCTTACAAAGTAGGTGCTAACGCATTTGCCGACCAAACAAATGAAGAGTTCAAAGCGGCTCGT

And this causes artificial SNPs in the outfiles as shown in the .loci file (the last 8 sites):

Carper_1-1       AGTGGAAGAATTTACAAGCCAAGTAAGGTGTCGAGTTCCCCACTGCTGATTTGATCAGAATGACTGACTCAATACGCACCAACTAATTCACCTAAAGACTGCCCTGACCGAATACCTATCCAGGTGTTTTTATATGAGTCNNNNGCTGGAAAACCCATTCGTGTTGCACATAGCATGACAGAAAATTGAANGCCAACTCGTGCAGAAGCAACAGACGTTGCTAATGCTGAATTATACTCCACATCTATTGTTGGTAAAATAATTTCCTTTATTTGAATTTATCTTTTTACAGGT---
Carper_1-2       AGTGGAAGAATTTACAAGCCAAGTAAGGTGTCGAGTTCCCCACTGCTGATTTGATCAGAATGACTGACTCAATACGCACCAACTAATTCACCTAAAGACTGCCCTGACCGAATACCTATCCAGGTGTTTTTATATGAGTCNNNNGCTGGAAAACCCATTCGTGTTGCACATAGCATGACAGAAAATTGAAGGCCAACTCGTGCAGAAGCAACAGACGTTGCTAATGCTGAATTATACTCCACATCTATTGTTGGTAAAATAATTTCCTTTATTTGAATTTATCTTTTTACTGAG---
[...]
//
                                                                                       *                      -
                                                                  ********|0|

Strangely setting trim_loci to 0, 15, 15, 0 in the params file does not solve the issue.

isaacovercast commented 4 months ago

Hm, interesting. So the barcodes for R1/R2 are identical? What is the second barcode doing?

Combinatorial barcode demultiplexing is only available for pair3rad datatype. What datatype are you using? You can run step 1 as pair3rad and it should remove the R2 barcode in this case.

The trim_loci setting that you want would be 0, 15, 0, 15, because for trimming loci we assume that R2 has been reverse complemented to be on the same strand as R1.

TomaszSuchan commented 4 months ago

OK, it works with pair3rad datatype in the 1st step. My data is pairddrad.

I'm using double barcoding for better control of chimeras on patterned flowcells. I guess it could be also pretty beneficial for RAD protocols where individuals are polled before the PCR. Would it be difficult to allow double barcoding for all the other datatypes?

isaacovercast commented 4 months ago

That makes sense. It would not be difficult to allow double barcoding for other datatypes, but it would be a non-trivial amount of work, and it's not a very common use-case so it's not super high priority. For those folks who need double barcoding it is easy enough to run step 1 as pair3rad (and has no other downstream consequences), and then switch datatypes for step 2 and beyond. I know this is kind of annoying, but it really just doesn't come up that often so it feels hard to justify putting a bunch of effort into it. I can put it on the list but can't promise I'll get to it soon.

TomaszSuchan commented 4 months ago

Sure, this sounds reasonable. I can do the changes in the documentation when I find some timeWiadomość napisana przez Isaac Overcast @.***> w dniu 16.02.2024, o godz. 16:06: That makes sense. It would not be difficult to allow double barcoding for other datatypes, but it would be a non-trivial amount of work, and it's not a very common use-case so it's not super high priority. For those folks who need double barcoding it is easy enough to run step 1 as pair3rad (and has no other downstream consequences), and then switch datatypes for step 2 and beyond. I know this is kind of annoying, but it really just doesn't come up that often so it feels hard to justify putting a bunch of effort into it. I can put it on the list but can't promise I'll get to it soon.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>