ARUP-NGS / BMFtools

Barcoded Molecular Families
MIT License
22 stars 8 forks source link

secondary collapse fails on down/sub-sampled fastq #105

Open sand60 opened 7 years ago

sand60 commented 7 years ago

Hi Daniel, I am unable to run collapse on subset of downsampled fastq, it fails without any useful message. I used seqtk to downsample. All other tools can use those fastq without any errors,. Thanks Sukhinder

dnbaker commented 7 years ago

seqtk probably mangled the fastq and/or metadata somehow. Can you post a small sample of your seqtk-processed fastq?

sand60 commented 7 years ago

attached is sample fastq with 5000 reads,.thanks seqtk_sample_5k_R1.fastq.gz

dnbaker commented 7 years ago
  1. Your reads are different lengths. BMFtools requires uniform read length.
  2. seqtk has mangled the names such that there is no comment field, which might cause some problems.

The first is probably your real problem.

sand60 commented 7 years ago

1 has usually worked for me,.and since the only difference during failing was seqtk. I think it is 2nd. Is that comment filed is mandatory for bmftools? I can try adding it back,.because I can see that is the only difference from my original fastq which worked. Thanks

sand60 commented 7 years ago

and that "Description" field is optional for fastq format,.so not sure why bmf must want it,.

dnbaker commented 7 years ago

What release are you using? Something related to this issue was fixed recently. https://github.com/ARUP-NGS/BMFtools/issues/98#issuecomment-301180549

sand60 commented 7 years ago

I have this, I guess I should re-pull? commit 7279788a21d31b302672b0145852698a9fb8d65e Merge: b40d9d6 cca362d Author: Brett Kennedy brett.jacob.kennedy@gmail.com Date: Wed Feb 22 15:33:22 2017 -0700 Merge pull request #95 from dnbh/master Bug fixes, portability.

dnbaker commented 7 years ago

If you update to master, it should handle problem 2. And you shouldn't be working with varying read lengths as that assumption is made throughout the code base, but if you think it's working, then you can do what you want.

sand60 commented 7 years ago

Should this fix it? Do I need to do anything different to implement the change? Thanks

commit f9483ea5999c9c23d81ebf47449d5c544f538db1 Merge: 7279788 7a6a3e4 Author: Brett Kennedy brett.jacob.kennedy@gmail.com Date: Mon May 15 11:33:59 2017 -0600 Merge pull request #99 from dnbh/master Fix issue 98

dnbaker commented 7 years ago

Yes, that commit does have the fix. (See https://github.com/ARUP-NGS/BMFtools/blob/f9483ea5999c9c23d81ebf47449d5c544f538db1/lib/mseq.cpp#L33).