fanglab / mbin

mBin: a methylation-based binning framework for metagenomic SMRT sequencing reads
Other
25 stars 3 forks source link

buildcontrols fails while reading the bam file #16

Closed volkansevim closed 3 years ago

volkansevim commented 3 years ago

I'm using the current version of mbin, on Linux & Python 2.7.17 (Suse Linux version 4.12.14-150.63-default)

I want to use mbin on a metagenome to assigning plasmids to genomes and improve binning.

I installed mbin as described in the documentation. I obtained a WGA dataset from pacbio to use as the IPD control. This is a mock metagenome containing bacteria as well as two yeast species. I mapped the reads to a concatenated reference of only the bacterial species using pbmm2 aligner v1.2.0 from Pacbio.

I run buildcontrols on the aligned bam file generated by pbbm2:

buildcontrols --procs=10 --ref=bacterial_refs_concat.fa aligned.bam

buildcontrols fails with this output:

`2021-02-26 14:20:43 [INFO] Initiating dictionary of all possible motifs... 2021-02-26 14:20:43 [INFO] - Adding 256 4-mer motifs... 2021-02-26 14:20:43 [INFO] Done: 256 possible contiguous motifs

2021-02-26 14:20:43 [INFO] - Adding 1024 5-mer motifs... 2021-02-26 14:20:43 [INFO] Done: 1536 possible contiguous motifs

2021-02-26 14:20:43 [INFO] - Adding 4096 6-mer motifs... 2021-02-26 14:20:43 [INFO] Done: 7680 possible contiguous motifs

2021-02-26 14:20:43 [INFO] - Adding bipartite motifs to search space... 2021-02-26 14:20:44 [INFO] Done: 194560 possible bipartite motifs

2021-02-26 14:20:44 [INFO] 2021-02-26 14:20:44 [INFO] Preparing to create new control data in ctrl_tmp Traceback (most recent call last): File "/global/cscratch1/sd/vsevim/software/my_p27/bin/buildcontrols", line 8, in sys.exit(launch()) File "/global/cscratch1/sd/vsevim/software/my_p27/lib/python2.7/site-packages/mbin/controls.py", line 20, in launch extract_controls(opts, control_aln_fn) File "/global/cscratch1/sd/vsevim/software/my_p27/lib/python2.7/site-packages/mbin/controls.py", line 40, in extract_controls opts = controls.scan_WGA_aligns() File "/global/cscratch1/sd/vsevim/software/my_p27/lib/python2.7/site-packages/mbin/controls.py", line 352, in scan_WGA_aligns reader = openIndexedAlignmentFile(self.control_aln_fn) File "/global/cscratch1/sd/vsevim/software/my_p27/lib/python2.7/site-packages/pbcore/io/opener.py", line 54, in openIndexedAlignmentFile return IndexedBamReader(fname, referenceFastaFname=referenceFastaFname, sharedIndex=sharedIndex) File "/global/cscratch1/sd/vsevim/software/my_p27/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 385, in init super(IndexedBamReader, self).init(fname, referenceFastaFname) File "/global/cscratch1/sd/vsevim/software/my_p27/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 198, in init self._loadReferenceInfo() File "/global/cscratch1/sd/vsevim/software/my_p27/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 73, in _loadReferenceInfo refMD5s = [r["M5"] for r in refRecords] KeyError: 'M5'`

It seems like the bam reader is looking for the 'M5' field in the file but, I can confirm that there is no such field in the bam header.

Do you have any suggestions on how to solve this issue?

Thanks!

fanggang commented 3 years ago

Thank you for your interest in our work. mbin was initially developed for PacBio RS II data. Currently it does not support bam files in the Sequel system yet. We do plan to support Sequel data down the road, and will make sure update this page as soon as an update is available. Thank you.

volkansevim commented 3 years ago

Thanks for the quick reply! It would be helpful to mention in the documentation that mbin currently doesn't support Sequel data.

fanggang commented 3 years ago

Thanks for the quick reply! It would be helpful to mention in the documentation that mbin currently doesn't support Sequel data.

Thank you for your suggestion. We have added this clarification on the frontpage, and will update as soon as a new version supporting Sequel data is available.