DReichLab / adna

Processing WGS aDNA data using the ReichLab protocol
12 stars 3 forks source link

Merge fails unexpectedly for duplicate (after barcode trimming) sequences #2

Open MatthewMah opened 7 years ago

MatthewMah commented 7 years ago

I think the following merge should be unambiguous. Read 1 is a 69 base-pair sequence with the first barcode from the first set prepended. Read 2 is a reverse complement of the 69 base-pair sequence with the first barcode from the second set prepended.

@NS500217:348:HTW2FBGXY:1:11101:22352:1064 1:N:0:0
TGACGCACTAGCATTACTTATATGATATGTCTCCATACCAATTACAATCTCCAAGTGAACGAGATCGGAAGAGCAC
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
@NS500217:348:HTW2FBGXY:1:11101:22352:1064 2:N:0:0
GTCTCAAGTGCTCTTCCGATCTCGTTCACTTGGAGATTGTAATTGGTATGGAGACATATCATATAAGTAATGCTAG
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

Barcodes are:

TGACGCA:ATCGTGC:CAGTATG:GCTACAT
GTCTCAA:TAGAGCC:ACTCTGG:CGAGATT

I think the problem is the same merge is counted in both the forward and reverse analysis steps.