bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
396 stars 190 forks source link

Failed to run taxator #93

Open JiaZhong28 opened 5 years ago

JiaZhong28 commented 5 years ago

Hi,I had the problem when I ran run classify_bins:

`Sender: jhscheduler System <jhadmin@node55>
Subject: Job 6609417: <clean_C3> Done

Job <clean_C3> was submitted from host <node119> by user <2013130172>.
Job was executed on host(s) <20*node55>, in queue <jynodequeue>, as user <2013130172>.
</stor9000/apps/users/NWSUAF/2013130172> was used as the home directory.
</stor9000/apps/users/NWSUAF/2013130172/rumen_microbes/bacteria/metawrap> was used as the working directory.
Started at Fri Jan 11 10:59:06 2019
Results reported at Fri Jan 11 12:47:02 2019

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
bash clean_bin.sh C3
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time   :  54664.59 sec.
    Max Memory :      5062 MB
    Max Swap   :     15299 MB

    Max Processes  :         6

The output (if any) follows:

metawrap classify_bins -b C3_BIN_REASSEMBLY/reassembled_bins_C75.10 -o C3_BIN_CLASSIFICATION -t 20

########################################################################################################################
#####                                   ALIGN CONTIGS TO DATABASE WITH MEGABLAST                                   #####
########################################################################################################################

------------------------------------------------------------------------------------------------------------------------
-----              setting up ouput folder C3_BIN_CLASSIFICATION and merging contigs from all bins...              -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                               aligning C3_BIN_CLASSIFICATION/all_contigs.fa to                               -----
-----           /stor9000/apps/users/NWSUAF/2013130172/database/NCBI_nt database with MEGABLAST. This is           -----
-----             the longest step - please be patient. You may look at the classification progress in             -----
-----                                 C3_BIN_CLASSIFICATION/megablast_out.raw.tab                                  -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                  removing unnecessary lines that lead to bad tax IDs (without a proper rank)                 -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                              making mapping file                                             -----
------------------------------------------------------------------------------------------------------------------------

########################################################################################################################
#####                              GET TAXONOMY FROM MEGABLAST OUTPUT WITH TAXATOR-TK                              #####
########################################################################################################################

------------------------------------------------------------------------------------------------------------------------
-----                                   pulling out classifications with taxator                                   -----
------------------------------------------------------------------------------------------------------------------------

************************************************************************************************************************
*****                                       Failed to run taxator. Exiting...                                      *****
************************************************************************************************************************

PS:

Read file <C3.err> for stderr output of this job.`

I have ten samples.Seven of them ran successfully and bin_taxonomy.tab was in my output files.But three of them failed.It seems that metaWRAP itself does not have problems.

I use metaWRAP v=1.0.2.

Many thanks!

ursky commented 5 years ago

Before doing anything else, I would retry with metaWRAP v=1.1 - there are a LOT of bug fixes that I incorporated over the past couple months. I cant recall seeing this, however.

Is there anything different about the three failed samples? Its strange that no error codes were thrown during the run... The line that terminated metaWRAP was:

cat ${out}/megablast_out.tab | taxator -a megan-lca -t 0.3 -e 0.01 -g ${out}/mapping.tax > ${out}/predictions.gff3

Does the C3_BIN_CLASSIFICATION/megablast_out.tab file look OK? Is there anything in C3_BIN_CLASSIFICATION/predictions.gff3? Does running taxator -h return the help message?

JiaZhong28 commented 5 years ago

Hi , I ran the bins of these samples one by one and I found there was just one bin is "bad"!And any bin which ran together with this "bad" bin could not give me the contig_taxonomy.tab file.That's really strange!

Now I have a new outfile of the error codes during the run: `An unrecoverable error occurred: std::exception

Here is some debugging information to locate the problem: /home/johdro/projects/taxator-tk_default.git/src/fileparser.hh(52): Throw in function FileParser::RecordType FileParser::n Dynamic exception type: boost::exception_detail::clone_impl std::exception::what: std::exception [exception_tag_line] = 14889 [exception_tag_taxid] = 85620 [exception_tag_general] = bad alignment reference taxon

real 51m37.700s user 16m9.612s sys 1m47.117s`

The megablast_out.tab of this bin looks normal:

NODE_1_length_149151_cov_5.189537 99281 99358 149151 gi|1436168267|gb|CP028149.1| 2840276 2840199 117 1e-19 73 78 NODE_1_length_149151_cov_5.189537 99281 99358 149151 gi|1436168267|gb|CP028149.1| 3084677 3084600 117 1e-19 73 78 NODE_1_length_149151_cov_5.189537 99281 99358 149151 gi|1403575081|emb|LS483461.1| 2812424 2812347 117 1e-19 73 78 NODE_1_length_149151_cov_5.189537 99281 99358 149151 gi|1403575081|emb|LS483461.1| 3045517 3045440 117 1e-19 73 78 NODE_1_length_149151_cov_5.189537 99281 99358 149151 gi|1403413836|emb|LS483393.1| 2819615 2819538 117 1e-19 73 78 NODE_1_length_149151_cov_5.189537 99281 99358 149151 gi|1403413836|emb|LS483393.1| 3066908 3066831 117 1e-19 73 78 NODE_1_length_149151_cov_5.189537 99281 99358 149151 gi|1398288123|gb|CP025501.1| 17254 17177 117 1e-19 73 78 NODE_1_length_149151_cov_5.189537 99281 99358 149151 gi|1398288123|gb|CP025501.1| 270059 269982 117 1e-19 73 78 NODE_1_length_149151_cov_5.189537 99281 99358 149151 gi|1243893249|gb|CP023410.1| 235873 235950 117 1e-19 73 78 NODE_1_length_149151_cov_5.189537 99281 99358 149151 gi|1243893249|gb|CP023410.1| 469847 469924 117 1e-19 73 78

And the predictions.gff3 file looks normal,too: ##gff-version 3 NODE_1_length_149151_cov_5.189537 taxator-tk sequence_feature 2456 2679 0 . . seqlen=149151;tax=1898474:224;rtax=1898474 NODE_1_length_149151_cov_5.189537 taxator-tk sequence_feature 99274 99364 0 . . seqlen=149151;tax=1:91;rtax=1 NODE_1_length_149151_cov_5.189537 taxator-tk sequence_feature 109241 109383 0 . . seqlen=149151;tax=1536773:143;rtax=1536773 NODE_1_length_149151_cov_5.189537 taxator-tk sequence_feature 110203 110357 0 . . seqlen=149151;tax=2:155;rtax=35623 NODE_2_length_100653_cov_5.908119 taxator-tk sequence_feature 36926 37617 0 . . seqlen=100653;tax=2:692;rtax=1334 NODE_2_length_100653_cov_5.908119 taxator-tk sequence_feature 37896 38294 0 . . seqlen=100653;tax=2:399;rtax=91347 NODE_2_length_100653_cov_5.908119 taxator-tk sequence_feature 93167 93288 0 . . seqlen=100653;tax=1236:122;rtax=1904944 NODE_2_length_100653_cov_5.908119 taxator-tk sequence_feature 95405 95914 0 . . seqlen=100653;tax=2:510;rtax=2048283 NODE_3_length_95394_cov_5.404677 taxator-tk sequence_feature 1 95394 0 . . seqlen=95394;tax=1;rtax=1

Running taxator -h returns the help message.

Many thaks!

JiaZhong28 commented 5 years ago

These are all output files of the "bad bin". metawrap.zip

ursky commented 5 years ago

Thank you for providing this detailed report! After some poking around with the data, I was finally able to find the culprit. Looks like some of the entries (taxonomies) in the newest NCBI database (which I did not have) did not work with this pipeline because they did not have the complete taxonomy. I added the appropriate exceptions and now it seems to work - see taxonomy below.

I will include this patch in metawrap v1.1.1, however I have a couple other fixes I want to make before I release it. To update on your system for now, please replace your current ~/miniconda2/bin/metawrap-scripts/prune_blast_hits.py (exact location will depend on your system) with the newest version found on github: https://github.com/bxlab/metaWRAP/blob/master/bin/metawrap-scripts/prune_blast_hits.py. You will notice I added extra entries to the exclude dictionary. Make sure that when you replace the file (or just edit it), the script remains executable. Run chmod +x ~/miniconda2/bin/metawrap-scripts/prune_blast_hits.py to make sure.

NODE_100_length_5757_cov_3.983627       Bacteria;uncultured bacterium Contig33
NODE_103_length_5448_cov_4.529324       Bacteria;Proteobacteria;Alphaproteobacteria;Rhodospirillales;Rhodospirillaceae
NODE_105_length_4966_cov_4.104111       Bacteria;Firmicutes;Clostridia;Thermoanaerobacterales;Thermoanaerobacteraceae
NODE_10_length_43399_cov_7.322677
NODE_113_length_3823_cov_5.211959       Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales
NODE_114_length_3775_cov_4.681720
NODE_11_length_41850_cov_4.452445       Bacteria;Firmicutes;Bacilli;Bacillales
NODE_121_length_2723_cov_3.471277       Bacteria;Cyanobacteria;Nostocales;Nostocaceae
NODE_12_length_37355_cov_5.778931
NODE_133_length_929_cov_3.680751        Bacteria;Chloroflexi;Anaerolineae;Anaerolineales;Anaerolineaceae
NODE_134_length_919_cov_10.416865
NODE_136_length_878_cov_3.680400
NODE_138_length_804_cov_5.618132        Bacteria
NODE_139_length_803_cov_4.673554
NODE_13_length_34020_cov_4.565536
NODE_140_length_801_cov_5.066298        Bacteria;Firmicutes
NODE_14_length_31383_cov_5.920942
NODE_15_length_31180_cov_4.727711
NODE_16_length_30943_cov_4.263064       Bacteria
NODE_17_length_30722_cov_4.748018
NODE_18_length_29004_cov_6.003457       Bacteria
NODE_19_length_28347_cov_4.971914
NODE_1_length_149151_cov_5.189537       Bacteria
NODE_20_length_27740_cov_7.018255
NODE_21_length_27157_cov_4.858124
NODE_22_length_24024_cov_5.199106
NODE_23_length_23486_cov_4.651032       Bacteria;Proteobacteria
NODE_24_length_23461_cov_3.797126
NODE_25_length_23269_cov_4.102622
NODE_26_length_22237_cov_4.713718
NODE_28_length_20713_cov_5.219471
NODE_29_length_20149_cov_4.449283
NODE_2_length_100653_cov_5.908119       Bacteria
NODE_30_length_20016_cov_4.232559       Bacteria
NODE_31_length_19261_cov_4.771737       Bacteria;Firmicutes;Clostridia;Clostridiales;Lachnospiraceae
NODE_33_length_18815_cov_4.129736       Bacteria
NODE_34_length_17525_cov_5.218936
NODE_35_length_17292_cov_4.889980       Bacteria;Firmicutes;Clostridia;Clostridiales
NODE_36_length_17172_cov_5.075344
NODE_38_length_16677_cov_5.348735
NODE_39_length_16452_cov_5.053679       Bacteria
NODE_3_length_95394_cov_5.404677
NODE_41_length_15458_cov_5.102984
NODE_43_length_15345_cov_5.363571
NODE_45_length_15179_cov_4.744206
NODE_4_length_56867_cov_5.942543
NODE_50_length_13934_cov_3.977773
NODE_52_length_13545_cov_3.959979
NODE_54_length_13108_cov_4.396746
NODE_55_length_13047_cov_5.225289       Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae
NODE_56_length_12976_cov_4.036825
NODE_57_length_12893_cov_4.845128       Bacteria;Cyanobacteria;Synechococcales;Synechococcaceae
NODE_58_length_12656_cov_3.927498       Bacteria
NODE_5_length_51335_cov_6.064692        Bacteria
NODE_60_length_12511_cov_4.682644
NODE_61_length_12343_cov_4.487934
NODE_62_length_12068_cov_5.633142       Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae
NODE_63_length_11807_cov_4.261893
NODE_64_length_11551_cov_4.863779       Bacteria
NODE_65_length_11396_cov_5.241187
NODE_66_length_11296_cov_3.786077
NODE_67_length_11177_cov_4.111591
NODE_69_length_10784_cov_4.848510       Bacteria
NODE_6_length_51212_cov_4.969669
NODE_70_length_10732_cov_4.216237
NODE_72_length_10488_cov_3.954279       Bacteria
NODE_76_length_9549_cov_4.730046        Bacteria;Firmicutes;Bacilli;Bacillales;Paenibacillaceae
NODE_77_length_9343_cov_3.761817
NODE_7_length_50461_cov_4.799123
NODE_80_length_8286_cov_4.258862        Bacteria;Firmicutes;Clostridia;Clostridiales
NODE_81_length_8282_cov_5.247898        Bacteria;Actinobacteria;Coriobacteriia;Coriobacteriales;Coriobacteriaceae
NODE_83_length_8019_cov_6.565727        Bacteria
NODE_89_length_7153_cov_5.167891        Bacteria;uncultured bacterium Contig17
NODE_8_length_45053_cov_6.086580
NODE_9_length_44928_cov_5.155604        Bacteria;Actinobacteria;Actinobacteria;Streptomycetales;Streptomycetaceae
ursky commented 5 years ago

Edit: the issue seems to be resolved and updated in metawrap v1.1.1, which is now out.

MarcelaHer commented 4 years ago

Hi, I got the same error. I am using metawrap 1.2.1 and ncbi-blast 2.10.0+

I have several errors, please could you see the .err file? Thanks.

At the end of the doc I get: Here is some debugging information to locate the problem: /home/johdro/projects/taxator-tk_default.git/src/fileparser.hh(52): Throw in function FileParser::RecordType FileParser::next() [with FactoryType = AlignmentRecordFactory; FileParser::RecordType = AlignmentRecordTaxonomy] Dynamic exception type: boost::exception_detail::clone_impl std::exception::what: std::exception [exception_tag_line] = 1 [exception_tag_taxid] = 446462 [exception_tag_general] = bad alignment reference taxon

derep_genomes.err.txt