calacademy-research / minibar

Dual barcode and primer demultiplexing for MinION sequenced reads
BSD 2-Clause "Simplified" License
35 stars 5 forks source link

Barcode orientation via primers #1

Closed AShaw1802 closed 5 years ago

AShaw1802 commented 5 years ago

Hi, Thanks for creating this tool! It's exactly what we've been hoping for for some time now.

I'm analyzing a dataset which has Illumina style dual indexing, so we have a sample with barcode 92 at the forward priming site and 94 at the reverse and then another sample with barcode 94 at the forward and 92 at the reverse. Even with the method set to option 1 I am getting a mixture of the two samples in the two output files. I was hoping that minibar would use the primer sequences I have entered to determine the orientation of the barcodes and demultiplex them accordingly. As far as I can tell, the primers are being are recognized in the majority of cases- the run output I get is

MappingFile.txt SequencingReads.fasta Index edit dist 7, Primer edit dist 7, Search Len 80, Search Method 1, Output Type C

87508 seqs: H 69472 HH 53512 Hh 5910 IDs 58595 Mult_IDs 6 (44.4259s)

The barcode of consists of 24 basepairs and primers of 26 bp – the reverse primer is shorter than the forward, so I’ve included some of the linker sequence to balance out the lengths. I’ve set the primer edit distance to be quite stringent so that the two ends can’t be confused.

Would it be possible to clarify if Minibar works how I imagine it to, using the primers to determine the orientation?

Thanks, Alex

jbh-cas commented 5 years ago

minibar.py was designed primarily to work with the error prone MinION sequences (that is, the error rate is much higher than Illumina reads). That's the reason there are 3 different Methods with varying sensitivity it can use to ascertain sample identity. The README.md discusses these -M [1|2|3] Method options.

Version 0.21 handles the situation of index reuse in the forward and reverse sets differently for each of the Methods. Method 2 considers this an error. Method 1 allows this and uses the suitable set to call the Sample. Method 3 (minibar.py default) warns but continues and calls the sample as in Method 1.

jbh-cas commented 5 years ago

Note on Method 3 and reuse of Indexes in forward and reverse sets:

Method 3 can result in hit strengths HH, Hh, or hh. In which H means primer and index found and h means just index found. This information is in the fasta record comment as discussed in the README. Those of type HH and Hh will check the forward or reverse index based on which primer was found. Type hh will always choose the forward index. This is why a warning is issued for Method 3, even though this method, unlike Method 1, can discover sample IDs when the primer at the end of the sequence can not be found but the index can.

WRui commented 3 years ago

Hi, we used dual barcodes. when I run the minibar and compared the result with the nanoplexer, I found that reads that have bote 3' and 5' barcodes were identified as unk.fastq. For example,

SampleID | ForwardPrimer | FwIndex | FwIndex(revcomp) | FwPrimer | FwPrimer(revcomp) | ReversePrimer | RvIndex | RvIndex(revcomp) | RvPrimer | RvPrimer(revcomp) test13 | bar48 | CATCTGGAACGTGGTACACCTGTA | TACAGGTGTACCACGTTCCAGATG | AAGCAGTGGTATCAACGCAGAGT | ACTCTGCGTTGATACCACTGCTT | bar5 | CTTGTCCAGGGTTTGTGTAACCTT | AAGGTTACACAAACCCTGGACAAG | TCAGACGTGTGCTCTTCCGATC | GATCGGAAGAGCACACGTCTGA AATCATGTACTTCGTTCAGTTACGTATTGCTGTGACTGCGGAGT(TCAGACGTGTGCTCTTCCGATC-3'anchor)(GTGTTACCGTGcaGAATGAATCCTT-bar7 with 2 mistmatch)AATTGGTGTTTTTTTTTTTTTGTTAGTAAATAAAATCATTTTAATATATGGCTTTCAAAAGACAGCCAGGTGAAAACTTGAGCAATACAATAAGTCATATTTATGAGTACGTTCAAAAATTCACAAAAAAGGGTACAATTCTGGCTTCTCTTTAATCATTAAATTTCAGTTTTACAAATAATTCAGGTTCAGGTTTTGAGGGGGAAACAGTTCTTGTATTATTACATCATCATTTTCTTCTGTAAATGACTCTATTAGCTAGGTTACAAACATTGTCACAGAAACAAATTTTTAAGCCATAGATCACTGCCGCATTTATATTTACAAAAGCCATAAACATGCATTTCTCAACTAGTAGGACTTAAATAGATGCTTGAATATTAAGGCAGTGATGATTCTAAAACATAATGAAATTCTAAGTTAAGGCTTTATGTTTCTTTTGAAACCCACACTCATAGGCAACTGTGATAAACAAACTCTTACCTACTAGGTTGAAGCTCCATCTGCGGGATATGTTATTTATCCATTCAACACTTCTTTGTGTCAAATGTATGGGATAGGAATTAGTAGCAAAACCATCAATTTACTTTAATGAATCATTGAGTCCTTACTAGGTTTGAGGACTTTCAGTAAGCACAGGCATATTGCCACAGAAAGGAGCATGTCGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTCCAATGTGGAGAACTTACAGGCTGCACTGATTCCTGCTTGACTGGAACTTAAGCCAACAATTAAGAATCAGGATCTACTAAATACAAGAAAATCCCCAAAGCATCCATGT(ACTCTGCGTTGATACCACTGCTT-5'anchor) (TACAGGTGTACCACGTTCCAGATG-bar48 revcomp)GCATAGCAATACATG was identified as unk sequence. how to fix it ? is the indexCombination fix seted wrong by me ?