DavidsonGroup / flexiplex

The Flexible Demultiplexer
https://davidsongroup.github.io/flexiplex/
MIT License
23 stars 2 forks source link

Limiting search space or reporting distance from 5' / 3' ends #21

Closed yxsee closed 1 year ago

yxsee commented 1 year ago

Hi @nadiadavidson, For working with single cell protocols with multiple barcodes (e.g. Split-seq, BD Rhapsody), the barcode sequences are adjacent to each other. As such, after extracting the first barcode and trimming the read, the next barcode and flanking sequences should be at the 5' or 3' end. However, the short flanking sequences may result in spurious barcode identification in the middle of the read. It'll be great if there is an option for limiting the search space for the barcode and flanking sequences (e.g. within 10bp from the 5' and 3' ends). Alternatively, the distance from 5' and 3' end can be reported in the barcode table as a sanity check.

yxsee commented 1 year ago

I just found the way to do this in the documentation. cat file.fastq | sed "/[@,+]/! s/^/START/g" | flexiplex -l "START" -r "" -u 0 -f 0 -k my_barcode_list.txt