donovan-h-parks / RefineM

A toolbox for improving metagenome-assembled genomes.
GNU General Public License v3.0
62 stars 9 forks source link

refinem ssu_erroneous error #20

Closed michoug closed 6 years ago

michoug commented 6 years ago

Hi Very nice pipeline, however I have an issue with the ssu_erroneous command Here is the log.

[2018-03-05 11:36:11] INFO: RefineM v0.0.23
[2018-03-05 11:36:11] INFO: refinem ssu_erroneous N4F2_filtered N4F2_filtered_taxons /data1/RefineMdb/gtdb_r80_ssu_db.2018-01-18.fna /data1/RefineMdb/gtdb_r80_taxonomy.2017-12-15.tsv N4F2_filtered_ssu
[2018-03-05 11:36:11] INFO: Identifying SSU rRNA genes.
[2018-03-05 11:36:20] INFO: Extracting SSU rRNA genes.
[2018-03-05 11:36:20] INFO: Classifying SSU rRNA genes.
[2018-03-05 11:36:22] INFO: Identifying scaffolds with 16S rRNA genes with divergent taxonomic classification.

Unexpected error: <type 'exceptions.KeyError'>
Traceback (most recent call last):
  File "/home/michoug/miniconda2/bin/refinem", line 396, in <module>
    parser.parse_options(args)
  File "/home/michoug/miniconda2/lib/python2.7/site-packages/refinem/main.py", line 689, in parse_options
    self.ssu_erroneous(options)
  File "/home/michoug/miniconda2/lib/python2.7/site-packages/refinem/main.py", line 335, in ssu_erroneous
    options.output_dir)
  File "/home/michoug/miniconda2/lib/python2.7/site-packages/refinem/ssu.py", line 537, in erroneous
    if r not in common_taxa[gid]:
KeyError: 'N4F2_MBin.3'

Any ideas ? I tried on other files and got the same error

donovan-h-parks commented 6 years ago

Best guess is that the MAGs/bins used during the "taxon_profile" command which produced the results in your "N4F2_filtered_taxons" directory has changed. Any chance a MAG/bin was added or removed to the "N4F2_filtered" directory? Specifically, the bin named "N4F2_MBin.3"?

michoug commented 6 years ago

I tried to rerun the taxon_profile and it didn't solve the issue

donovan-h-parks commented 6 years ago

If you can send me the bins (or ideally a subset of bins) along with the commands that result in the issue I can look into it on my end. I would need all the RefineM commands you ran and not just the "ssu_erroneous" command so that I can replicate each of your steps.

bsiranosian commented 6 years ago

Hi, I'm getting a similar error as michoug. Did you figure out that problem?

$ refinem ssu_erroneous bins metabat_refinem/taxon_profile $SSUDB $REFTAX ONOMY metabat_refinem/ssu -x fa [2018-05-31 09:54:19] INFO: RefineM v0.0.23 [2018-05-31 09:54:19] INFO: refinem ssu_erroneous bins metabat_refinem/taxon_profile /labs/asbhatt/bsiranos/refinem_db/gtdb_r80_ssu_db. 2018-01-18.fna /labs/asbhatt/bsiranos/refinem_db/gtdb_r80_taxonomy.2017-12-15.tsv metabat_refinem/ssu -x fa [2018-05-31 09:54:19] INFO: Identifying SSU rRNA genes. [2018-05-31 09:56:16] INFO: Extracting SSU rRNA genes. [2018-05-31 09:56:17] INFO: Classifying SSU rRNA genes. [2018-05-31 09:56:38] INFO: Identifying scaffolds with 16S rRNA genes with divergent taxonomic classification. Unexpected error: <type 'exceptions.KeyError'> Traceback (most recent call last): File "/home/bsiranos/miniconda3/envs/mgwf/bin/refinem", line 396, in <module> parser.parse_options(args) File "/home/bsiranos/miniconda3/envs/mgwf/lib/python2.7/site-packages/refinem/main.py", line 689, in parse_options self.ssu_erroneous(options) File "/home/bsiranos/miniconda3/envs/mgwf/lib/python2.7/site-packages/refinem/main.py", line 335, in ssu_erroneous options.output_dir) File "/home/bsiranos/miniconda3/envs/mgwf/lib/python2.7/site-packages/refinem/ssu.py", line 537, in erroneous if r not in common_taxa[gid]: KeyError: 'bin.2'

There is a 'bin.2' folder in the ssu output directory: $ ls metabat_refinem/ssu/bin.2 ssu.blastn.tsv ssu.fna ssu.hmm_archaea.txt ssu.hmm_bacteria.txt ssu.hmm_euk.txt ssu.hmm_summary.tsv ssu.taxonomy.tsv

And bin.2 filed in the taxon profile output: $ ls metabat_refinem/taxon_profile/bin_reports/bin.2.* metabat_refinem/taxon_profile/bin_reports/bin.2.filtered_genes.gene.tsv metabat_refinem/taxon_profile/bin_reports/bin.2.filtered_genes.profile.tsv metabat_refinem/taxon_profile/bin_reports/bin.2.filtered_genes.scaffolds.tsv

Let me know if you want more information or to send you some files.

donovan-h-parks commented 6 years ago

What is the full name of "bin.2" is it "bin.2.fna" or something similar? I'm wondering if this is a parsing error by RefineM due to having a "." in the bin filename.

bsiranosian commented 6 years ago

yes the bins are named bin.[#].fa, they are the output of metabat

donovan-h-parks commented 6 years ago

Can you try renaming the bin to "bin_2.fa" and see if this resolves the problem?

bsiranosian commented 6 years ago

I renamed all the bins to follow that convention and ran all the steps in the readme without problems. Previously all the other steps were fine, just the ssu_erroneous step was breaking. Do you think this is an easy fix or should I just be content with having to rename bins and name the results back?

donovan-h-parks commented 6 years ago

Thanks for exploring the issue. It should be an easy fix. I'll aim to release a new version of RefineM tomorrow.

donovan-h-parks commented 6 years ago

Hello. I am unable to produce the issue on my end. I was able to run a bin named bin.0.fna through RefineM v0.0.23 without issue. Any chance you changed the name of your bin files at any point?

I did go through the code and can't see why the name of the bin file would be an issue.

bsiranosian commented 6 years ago

Hi, after trying this again it seems like the error was coming from some output missing from the taxon profile command. After making sure everything leading up to the ssu_erroneous step ran correctly it's not a problem any more. Thanks for looking into it though!

donovan-h-parks commented 6 years ago

Excellent. Glad it is working.