Closed mschubert closed 2 weeks ago
@mschubert would you be willing to share some of your data, or one FASTQ record that should have been matched to a sample along with the expected barcode?
My apologies for the late reply: The issue seems to be with fastq
records that contain N
s themselves:
# my.fq.gz
@M00872:1070:000000000-GLPWM:1:1101:15776:1330 1:Y:0:1
AAGANNATNGNNGNNANNNTNNNAACGTAGTGCGCCAGCCTATTTCAGTGCTCAATCTTGCAGAGAATACTCTTGAGAGCG
+
AA1A##>>#>##A##A###A###ABBFFFHGGHEGGGGGGHHFHHHHHHHHHGFHHHHHHHHHHHCGHHFHHHGHHHHHHE
@M00872:1070:000000000-GLPWM:1:1101:15866:1331 1:Y:0:1
AAGANNATNGNNGNNANNNTNNNAACGTAGTGCGCATAAGCCGTTCAAGAGGAGCCATTGTGGGGAGGCCCTGGGGACTGG
+
AAAA##>>#>##A##A###A###BABFFHHGGHEEEEGHFFHGEEGHGFHEHHEHGFHHHFGFC>FCGGCEHHHGGAEFG/
# meta.tsv
sample_id barcode
test NNNNNNN
fqtk demux --inputs my.fq.gz --max-mismatches 0 --read-structures 7B+T --sample-metadata meta.tsv --output out
Thank-you @mschubert for the clear report!
With https://github.com/fulcrumgenomics/fqtk/pull/30 (and release
v0.3.0
)fqtk
allowsN
s in barcodes.I tried to run demultiplexing accepting any sequence for a sample (with a barcode containing only
N
s), but all reads are written tounmatched.R1.fq.gz
instead of the samplefq
.Is this intended?