bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
391 stars 191 forks source link

issue with classification output #36

Open palomo11 opened 6 years ago

palomo11 commented 6 years ago

Hi,

I have run the Classify_bins module on over 100 bins. The output in some of them are correct (as I already double checked with another tool) but at least in one bin that I was expecting a specific taxonomy, the output is different. I have checked the contig_taxonomy.tab and the bin_taxonomy.tab files and this is what I have found for that specific bin.

contig_taxonomy.tab file:

RSF33_CG26_100  Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_104  
RSF33_CG26_105  Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_106  Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_107  Bacteria
RSF33_CG26_108  
RSF33_CG26_109  Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_110  Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_111  Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae
RSF33_CG26_12   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_13   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_17   Bacteria;Proteobacteria;Alphaproteobacteria;Rickettsiales;Rickettsiaceae
RSF33_CG26_18   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_19   
RSF33_CG26_21   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_22   Bacteria
RSF33_CG26_24   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_28   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_3    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_30   Bacteria;Proteobacteria;Betaproteobacteria
RSF33_CG26_33   Bacteria;Proteobacteria;Alphaproteobacteria;Sphingomonadales;Sphingomonadaceae
RSF33_CG26_34   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_36   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_37   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_4    
RSF33_CG26_44   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_45   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_47   Bacteria;Proteobacteria
RSF33_CG26_51   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_53   Bacteria;Proteobacteria
RSF33_CG26_57   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_58   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_59   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF33_CG26_6    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_60   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_61   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_64   
RSF33_CG26_7    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_72   
RSF33_CG26_73   
RSF33_CG26_74   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_75   
RSF33_CG26_78   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF33_CG26_8    Bacteria;Proteobacteria;Gammaproteobacteria;Methylococcales;Methylococcaceae
RSF33_CG26_83   Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Alteromonadaceae
RSF33_CG26_84   
RSF33_CG26_86   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_87   Bacteria
RSF33_CG26_89   Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Pseudoalteromonadaceae
RSF33_CG26_9    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_90   Bacteria;Proteobacteria;Gammaproteobacteria;Methylococcales;Methylococcaceae;Methylomonas
RSF33_CG26_91   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_94   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas;Nitrosomonas sp. Is79A3
RSF33_CG26_95   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_97   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_99   Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae

bin_taxonomy.tab file:

RSF33_CG26.fa   Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae

I'm pretty sure this bin belongs to Nitrosomonadaceae (ANI of 89% with other Nitrosomonas spp.), and that seems to be also the classfication for most of the contigs, however, the output says a totally different taxonomy.

I have checked in the contig_taxonomy.tab file and there are 319 contigs classified as Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae, but any of them belongs to bin RSF33_CG26.

Any idea of what is going on? I'm not sure if this issue is also happening in other bins...

I'm using metaWRAP v=0.9.

ursky commented 6 years ago

Well that's not good. Can you give me the lengths of those contigs? The algorithms places weight on each classification based on the length of the contig. So maybe the majority of the length of the bin was classified as Rhodobacteraceae?

palomo11 commented 6 years ago

Here is the length of each of those contigs:

RSF33_CG26_100 | 10170
RSF33_CG26_104 | 382479
RSF33_CG26_105 | 75622
RSF33_CG26_106 | 102118
RSF33_CG26_107 | 263163
RSF33_CG26_108 | 63391
RSF33_CG26_109 | 119540
RSF33_CG26_110 | 1685748
RSF33_CG26_111 | 3749
RSF33_CG26_12 | 3424
RSF33_CG26_13 | 1675
RSF33_CG26_17 | 1304
RSF33_CG26_18 | 2397
RSF33_CG26_19 | 6311
RSF33_CG26_21 | 4601
RSF33_CG26_22 | 8056
RSF33_CG26_24 | 1570
RSF33_CG26_28 | 1716
RSF33_CG26_3 | 2962
RSF33_CG26_30 | 9526
RSF33_CG26_33 | 5056
RSF33_CG26_34 | 3417
RSF33_CG26_36 | 2566
RSF33_CG26_37 | 2380
RSF33_CG26_4 | 4326
RSF33_CG26_44 | 3332
RSF33_CG26_45 | 4457
RSF33_CG26_47 | 2519
RSF33_CG26_51 | 2360
RSF33_CG26_53 | 11798
RSF33_CG26_57 | 1482
RSF33_CG26_58 | 2966
RSF33_CG26_59 | 18965
RSF33_CG26_6 | 2243
RSF33_CG26_60 | 2940
RSF33_CG26_61 | 8900
RSF33_CG26_64 | 3828
RSF33_CG26_7 | 8263
RSF33_CG26_72 | 3002
RSF33_CG26_73 | 4783
RSF33_CG26_74 | 3256
RSF33_CG26_75 | 4647
RSF33_CG26_78 | 1545
RSF33_CG26_8 | 2946
RSF33_CG26_83 | 2395
RSF33_CG26_84 | 18201
RSF33_CG26_86 | 1568
RSF33_CG26_87 | 2055
RSF33_CG26_89 | 6728
RSF33_CG26_9 | 1596
RSF33_CG26_90 | 2116
RSF33_CG26_91 | 1602
RSF33_CG26_94 | 1521
RSF33_CG26_95 | 1841
RSF33_CG26_97 | 4154
RSF33_CG26_99 | 4136

But any of the contigs of that bin seems to be clasiffied as Rhodobacteraceae

ursky commented 6 years ago

I didn’t catch that. That’s definitely a big, but I can’t figure out how that happens just by eyeballing my code. Can you upload the whole contig taxonomy file and the bin membership of the contigs (any format) so I can try to replicate this?

ursky commented 6 years ago

Actually, hold off. I think I see the issue.

ursky commented 6 years ago

So I was able to find one issue, but I don't believe it could lead to such a grievous error. I am attaching a modified diagnostic script that will output more details about what is going on. You dont have to repeat the analysis. Just run python classify_bins_diagnostic.py contig_taxonomy.tab RSF33_CG26.fa, the contig_taxonomy.tab being the file in the metawrap output, and the second being the problematic bin in fasta format. It will try to classify the bin again, but also print its "thought process". Please report back what it says... classify_bins_diagnostic.py.gz

palomo11 commented 6 years ago

This is the output:

python classify_bins_diagnostic.py contig_taxonomy.tab RSF33_CG26.fa

RSF33_CG26_3    2962    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_6    2243    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_7    8263    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_8    2946    Bacteria;Proteobacteria;Gammaproteobacteria;Methylococcales;Methylococcaceae
RSF33_CG26_9    1596    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_12   3424    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_13   1675    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_17   1304    Bacteria;Proteobacteria;Alphaproteobacteria;Rickettsiales;Rickettsiaceae
RSF33_CG26_18   2397    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_21   4601    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_22   8056    Bacteria
RSF33_CG26_24   1570    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_28   1716    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_30   9526    Bacteria;Proteobacteria;Betaproteobacteria
RSF33_CG26_33   5056    Bacteria;Proteobacteria;Alphaproteobacteria;Sphingomonadales;Sphingomonadaceae
RSF33_CG26_34   3417    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_36   2566    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_37   2380    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_44   3332    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_45   4457    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_47   2519    Bacteria;Proteobacteria
RSF33_CG26_51   2360    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_53   11798   Bacteria;Proteobacteria
RSF33_CG26_57   1482    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_58   2966    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_59   18965   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF33_CG26_60   2940    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_61   8900    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_74   3256    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_78   1545    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF33_CG26_83   2395    Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Alteromonadaceae
RSF33_CG26_86   1568    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_87   2055    Bacteria
RSF33_CG26_89   6728    Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Pseudoalteromonadaceae
RSF33_CG26_90   2116    Bacteria;Proteobacteria;Gammaproteobacteria;Methylococcales;Methylococcaceae;Methylomonas
RSF33_CG26_91   1602    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_94   1521    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas;Nitrosomonas sp. Is79A3
RSF33_CG26_95   1841    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_97   4154    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_99   4136    Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae
RSF33_CG26_100  10170   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_105  75622   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_106  102118  Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_107  263163  Bacteria
RSF33_CG26_109  119540  Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_110  1685748 Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF33_CG26_111  3749    Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae
{'Bacteria': [2418444, {'Proteobacteria': [2145170, {'Betaproteobacteria': [2106559, {'Burkholderiales': [4136, {'Comamonadaceae': [4136, {}]}], 'Nitrosomonadales': [2092897, {'Nitrosomonadaceae': [2092897, {'Nitrosomonas': [22031, {'Nitrosomonas sp. Is79A3': [1521, {}]}]}]}]}], 'Alphaproteobacteria': [10109, {'Rickettsiales': [1304, {'Rickettsiaceae': [1304, {}]}], 'Rhodobacterales': [3749, {'Rhodobacteraceae': [3749, {}]}], 'Sphingomonadales': [5056, {'Sphingomonadaceae': [5056, {}]}]}], 'Gammaproteobacteria': [14185, {'Alteromonadales': [9123, {'Alteromonadaceae': [2395, {}], 'Pseudoalteromonadaceae': [6728, {}]}], 'Methylococcales': [5062, {'Methylococcaceae': [5062, {'Methylomonas': [2116, {}]}]}]}]}]}]}
RSF33_CG26.fa   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae

Now,It looks as expected.

palomo11 commented 6 years ago

By the way, a similar case happens with RSF29. It was classified as: RSF29.fa Bacteria;Proteobacteria but when I have run the script you sent me it is also classified as Nitrosomonadaceae

RSF29_1 4154    Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Oxalobacteraceae
RSF29_2 6722    Bacteria
RSF29_3 13050   Bacteria;Proteobacteria
RSF29_4 11200   Bacteria;Proteobacteria
RSF29_5 6603    uncultured microorganism
RSF29_6 5622    Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae
RSF29_7 1858    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_8 10621   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_10    10385   Bacteria;Proteobacteria;Gammaproteobacteria
RSF29_11    6035    Bacteria;Proteobacteria
RSF29_12    2254    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF29_14    1747    Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Alcaligenaceae
RSF29_17    14685   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_18    12946   Bacteria;Proteobacteria
RSF29_19    28797   Bacteria;Proteobacteria
RSF29_21    12271   Bacteria;Proteobacteria
RSF29_22    5575    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_23    7351    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_24    7116    Bacteria;Proteobacteria;Gammaproteobacteria;Aeromonadales;Aeromonadaceae
RSF29_27    16017   Bacteria;Proteobacteria
RSF29_28    11092   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_29    5077    Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales
RSF29_30    3742    Bacteria
RSF29_31    9696    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_34    2647    Bacteria;Proteobacteria;Betaproteobacteria
RSF29_35    1434    Bacteria
RSF29_37    8045    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_38    9952    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_39    4679    Bacteria
RSF29_41    8043    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF29_42    10052   Bacteria
RSF29_45    5161    Bacteria;Proteobacteria
RSF29_46    7532    Bacteria;Proteobacteria
RSF29_47    5663    Bacteria;Proteobacteria;Betaproteobacteria
RSF29_48    11747   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_49    24268   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_51    1609    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_52    9778    Bacteria;Proteobacteria
RSF29_53    3251    Bacteria;Bacteroidetes;Cytophagia;Cytophagales;Cytophagaceae
RSF29_56    11717   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_57    16156   Bacteria
RSF29_58    5853    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_59    9464    Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales
RSF29_60    5332    Bacteria
RSF29_61    10732   Bacteria;Proteobacteria
RSF29_63    17035   Bacteria;Proteobacteria
RSF29_64    10419   Bacteria;Proteobacteria;Betaproteobacteria
RSF29_65    10364   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales
RSF29_67    1383    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_69    15588   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales
RSF29_70    13017   Bacteria;Proteobacteria
RSF29_74    3547    Bacteria;Proteobacteria;Gammaproteobacteria;uncultured gamma proteobacterium
RSF29_75    5263    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_76    2962    Bacteria;Proteobacteria;Betaproteobacteria
RSF29_77    9110    Bacteria;Proteobacteria
RSF29_79    8696    Bacteria;Proteobacteria;Betaproteobacteria
RSF29_80    10168   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_81    9052    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_82    4206    Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales
RSF29_85    1558    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_87    26799   Bacteria;Proteobacteria;Betaproteobacteria
RSF29_89    13562   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_91    6308    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_92    2340    Bacteria;Proteobacteria
RSF29_93    6972    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_94    3540    Bacteria;Proteobacteria
RSF29_95    4709    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_99    7281    Bacteria;Proteobacteria
RSF29_100   2810    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF29_101   4601    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_106   11648   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_107   3378    Bacteria;Proteobacteria;Gammaproteobacteria
RSF29_109   7467    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_110   9321    Bacteria;Proteobacteria;Betaproteobacteria
RSF29_112   1664    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF29_113   2034    Bacteria;Proteobacteria;Betaproteobacteria;Rhodocyclales;Rhodocyclaceae;Rugosibacter;Rugosibacter aromaticivorans
RSF29_115   6837    Bacteria;Proteobacteria
RSF29_116   1382    Bacteria;Chlorobi;Chlorobia;Chlorobiales;Chlorobiaceae
RSF29_117   2091    Bacteria
RSF29_120   1658    Bacteria;uncultured bacterium
RSF29_121   3998    Bacteria;Proteobacteria
RSF29_123   10279   Bacteria;Proteobacteria
RSF29_124   7333    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_127   4534    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_129   1441    Bacteria;Proteobacteria;Gammaproteobacteria
RSF29_131   6195    Bacteria;Proteobacteria
RSF29_132   14157   Bacteria;Proteobacteria
RSF29_136   7223    Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales
RSF29_138   11739   Bacteria;Proteobacteria;Betaproteobacteria
RSF29_139   6721    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_140   7038    Bacteria;Proteobacteria
RSF29_142   6577    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF29_143   1808    Bacteria;Proteobacteria
RSF29_144   28266   Bacteria;Proteobacteria
RSF29_145   7168    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales
RSF29_147   3791    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_148   3327    Bacteria;Proteobacteria
RSF29_154   14259   Bacteria;Proteobacteria;Betaproteobacteria
RSF29_155   6355    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_156   12748   Bacteria
RSF29_158   2088    Bacteria;Proteobacteria;Betaproteobacteria
RSF29_159   1282    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas;Nitrosomonas sp. AL212
RSF29_160   15890   Bacteria;Proteobacteria;Betaproteobacteria
RSF29_161   8380    Bacteria;Proteobacteria
RSF29_162   25072   Bacteria;Proteobacteria;Betaproteobacteria
RSF29_166   2656    Bacteria
RSF29_167   1282    Bacteria;Proteobacteria;Betaproteobacteria
RSF29_169   7214    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_170   11502   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_172   24942   Bacteria;Proteobacteria
RSF29_173   3741    Bacteria;Proteobacteria
RSF29_174   1245    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF29_176   2711    Bacteria;Proteobacteria
RSF29_180   2410    Bacteria;uncultured bacterium
RSF29_181   1499    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_184   4977    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_186   13168   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales
RSF29_187   7535    Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Oxalobacteraceae
RSF29_188   6682    Bacteria;Proteobacteria
RSF29_191   2098    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF29_194   4131    Bacteria;Proteobacteria;Betaproteobacteria;Rhodocyclales;Rhodocyclaceae
RSF29_195   11562   Bacteria;Proteobacteria
RSF29_198   14574   Bacteria;Proteobacteria
RSF29_199   3035    Bacteria;Proteobacteria
RSF29_201   2243    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas
RSF29_203   2079    Bacteria;Cyanobacteria;Synechococcales;Prochloraceae
RSF29_204   2648    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas;Nitrosomonas sp. AL212
RSF29_206   2962    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_210   85860   Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
RSF29_213   15893   Bacteria;Proteobacteria
RSF29_215   3853    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
{'uncultured microorganism': [6603, {}], 'Bacteria': [1090069, {'Chlorobi': [1382, {'Chlorobia': [1382, {'Chlorobiales': [1382, {'Chlorobiaceae': [1382, {}]}]}]}], 'uncultured bacterium': [4068, {}], 'Bacteroidetes': [3251, {'Cytophagia': [3251, {'Cytophagales': [3251, {'Cytophagaceae': [3251, {}]}]}]}], 'Proteobacteria': [1013677, {'Betaproteobacteria': [622921, {'Burkholderiales': [39406, {'Oxalobacteraceae': [11689, {}], 'Alcaligenaceae': [1747, {}]}], 'Nitrosomonadales': [440513, {'Nitrosomonadaceae': [394225, {'Nitrosomonas': [30864, {'Nitrosomonas sp. AL212': [3930, {}]}]}]}], 'Rhodocyclales': [6165, {'Rhodocyclaceae': [6165, {'Rugosibacter': [2034, {'Rugosibacter aromaticivorans': [2034, {}]}]}]}]}], 'Gammaproteobacteria': [31489, {'Aeromonadales': [7116, {'Aeromonadaceae': [7116, {}]}], 'uncultured gamma proteobacterium': [3547, {}], 'Enterobacterales': [5622, {'Enterobacteriaceae': [5622, {}]}]}]}], 'Cyanobacteria': [2079, {'Synechococcales': [2079, {'Prochloraceae': [2079, {}]}]}]}]}
RSF29.fa    Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae
ursky commented 6 years ago

Very strange. Maybe that fixed it then. Try running the full classification now (attaching the script): python classify_bins.py contig_taxonomy.tab FOLDER_WITH_BIN_FILES.

If that still works well and gives you the right results, can you please test one more thing to make sure that fully fixed it before I release a patch? Make sure this new script is executable (chmod +x classify_bins.py), and replace the original script in /miniconda2/bin/metawrap-scripts/classify_bins.py (might be a bit different for you) with the new one. Then re-run the entire metawrap module and see if you still get proper results.

classify_bins.py.gz

palomo11 commented 6 years ago

I have just re-run it and it has worked fine. Thanks!!!

ursky commented 6 years ago

Thank you! The change will be included in metaWRAP v=0.9.4