fulcrumgenomics / fqtk

Fast FASTQ sample demultiplexing in Rust.
MIT License
54 stars 1 forks source link

Demultiplexing "N"-barcode as no-op #47

Closed mschubert closed 2 weeks ago

mschubert commented 1 month ago

With https://github.com/fulcrumgenomics/fqtk/pull/30 (and release v0.3.0) fqtk allows Ns in barcodes.

I tried to run demultiplexing accepting any sequence for a sample (with a barcode containing only Ns), but all reads are written to unmatched.R1.fq.gz instead of the sample fq.

Is this intended?

nh13 commented 1 month ago

@mschubert would you be willing to share some of your data, or one FASTQ record that should have been matched to a sample along with the expected barcode?

mschubert commented 2 weeks ago

My apologies for the late reply: The issue seems to be with fastq records that contain Ns themselves:

# my.fq.gz
@M00872:1070:000000000-GLPWM:1:1101:15776:1330 1:Y:0:1
AAGANNATNGNNGNNANNNTNNNAACGTAGTGCGCCAGCCTATTTCAGTGCTCAATCTTGCAGAGAATACTCTTGAGAGCG
+
AA1A##>>#>##A##A###A###ABBFFFHGGHEGGGGGGHHFHHHHHHHHHGFHHHHHHHHHHHCGHHFHHHGHHHHHHE
@M00872:1070:000000000-GLPWM:1:1101:15866:1331 1:Y:0:1
AAGANNATNGNNGNNANNNTNNNAACGTAGTGCGCATAAGCCGTTCAAGAGGAGCCATTGTGGGGAGGCCCTGGGGACTGG
+
AAAA##>>#>##A##A###A###BABFFHHGGHEEEEGHFFHGEEGHGFHEHHEHGFHHHFGFC>FCGGCEHHHGGAEFG/
# meta.tsv
sample_id  barcode
test       NNNNNNN
fqtk demux --inputs my.fq.gz --max-mismatches 0 --read-structures 7B+T --sample-metadata meta.tsv --output out
nh13 commented 2 weeks ago

Thank-you @mschubert for the clear report!