elizabethmcd / metabolisHMM

Tool for constructing phylogenies and summarizing metabolic characteristics based on curated and custom profile HMMs
GNU General Public License v3.0
17 stars 5 forks source link

The directory of curated metabolic markers could not be found. #41

Open sarah872 opened 4 years ago

sarah872 commented 4 years ago

Hi, I am running the 'summarize-markers' as

summarize-metabolism --input aquifer-genomes/ --output summary --metadata groups.csv

but I am getting the following error:

#############################################
metabolisHMM v1.4.0
     The directory of curated metabolic markers could not be found.
     Please either download the markers from https://github.com/elizabethmcd/metabolisHMM/releases/download/v2.0/metabolisHMM_v2.0_markers.tgz and decompress the tarball, or move the directory to where you are running the workflow from.

However, the models exist in curated_markers/metabolic_markers/*hmm Also, where is the make-heatmap.R?

elizabethmcd commented 4 years ago

Sorry, this is an error on my part. I recently changed the structure of the curated markers folder and how to check if the database was downloaded. I will try to make this fix within the next couple of days and push a new version.

Additionally, you shouldn't need the make-heatmap.R script as all plotting is done within python now. Is there are some part of the tutorial or help menu that still includes this? This is also my fault, I only recently made all plotting within python.

Thank you for testing!

sarah872 commented 4 years ago

I was asking about the make-heatmap.R because I am getting the following error when running the search-custom-markers:

search-custom-markers --input aquifer-genomes/ --output outdir --markers_dir curated_markers/metabolic_markers/ --markers_list curated_markers/list_metabolic-markers --metadata groups.csv --aggregate ON

#############################################
metabolisHMM v1.4.0
Reformatting fasta files...
Running HMM searches using custom marker set...
Parsing all results...
/home/user/py3-venv/lib/python3.7/site-packages/metabolisHMM-2.0-py3.7.egg/EGG-INFO/scripts/search-custom-markers:204: DeprecationWarning: 'U' mode is deprecated
  with open(result, "rU") as input:
Plotting results...
Traceback (most recent call last):
  File "/home/user/py3-venv/bin/search-custom-markers", line 4, in <module>
    __import__('pkg_resources').run_script('metabolisHMM==2.0', 'search-custom-markers')
  File "/home/user/py3-venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 661, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/user/py3-venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1441, in run_script
    exec(code, namespace, namespace)
  File "/home/user/py3-venv/lib/python3.7/site-packages/metabolisHMM-2.0-py3.7.egg/EGG-INFO/scripts/search-custom-markers", line 307, in <module>
    plot=sns.heatmap(agg, cmap="viridis",xticklabels=xticks, square=True, linewidths=1, linecolor='black', cbar=True, cbar_kws={"shrink": .50})
  File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 517, in heatmap
    yticklabels, mask)
  File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 167, in __init__
    cmap, center, robust)
  File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 206, in _determine_cmap_params
    vmin = np.percentile(calc_data, 2) if robust else calc_data.min()
  File "/home/user/py3-venv/lib/python3.7/site-packages/numpy/core/_methods.py", line 32, in _amin
    return umr_minimum(a, axis, None, out, keepdims, initial)
ValueError: zero-size array to reduction operation minimum which has no identity
elizabethmcd commented 4 years ago

This isn't an R related error, it's an error for plotting with python, which is suggesting there is either an issue with the correct number of HMMs ran and/or your metadata file. What do your markers_list and metadata files look like?

sarah872 commented 4 years ago

This is my groups.csv

GCA_001766875.1_ASM176687v1,groupA
GCA_001766905.1_ASM176690v1,groupA
GCA_001766965.1_ASM176696v1,groupB
GCA_001766985.1_ASM176698v1,groupB
GCA_001767145.1_ASM176714v1,groupB

And this is my list_metabolic-markers list:

acetate_citrate_lyase_aclA.hmm
acetate_citrate_lyase_aclB.hmm
aprA_TIGR02061.hmm
carbon_monoxide_dehydrogenase_coxL_TIGR02416.hmm
carbon_monoxide_dehydrogenase_coxM.hmm
carbon_monoxide_dehydrogenase_coxS.hmm
ccoN_TIGR00780.hmm
ccoO_TIGR00781.hmm
ccoP_TIGR00782.hmm
codh_catalytic_TIGR01702.hmm
codhC_TIGR00316.hmm
codhD_TIGR00381.hmm
coxA_TIGR02891.hmm
coxB_TIGR02866.hmm
cydA_PF01654.hmm
cydB_TIGR00203.hmm
cyoA_TIGR01433.hmm
cyoD_TIGR02847.hmm
cyoE_TIGR01473.hmm
dsrA_TIGR02064.hmm
dsrB_TIGR02066.hmm
dsrD_PF08679.hmm
fae_TIGR03126.hmm
fccB_PF09242.hmm
fdhA_TIGR01591.hmm
fdhB_TIGR01582.hmm
fdhC_TIGR01583.hmm
fdh_thiol_id_TIGR02819.hmm
FeFeHydrogenase_TIGR02512.hmm
FeFeHydrogenase_TIGR04105.hmm
fmtf_TIGR03119.hmm
hydrazine_oxidase_hzoA.hmm
hydrazine_synthase_hzsA.hmm
Hydrogenase_Group_1.hmm
Hydrogenase_Group_2a.hmm
Hydrogenase_Group_2b.hmm
Hydrogenase_Group_3a.hmm
Hydrogenase_Group_3b.hmm
Hydrogenase_Group_3c.hmm
Hydrogenase_Group_3d.hmm
Hydrogenase_Group_4.hmm
madA_TIGR02659.hmm
madB_TIGR02658.hmm
mtmc_TIGR03120.hmm
napA_TIGR01706.hmm
napB_PF03892.hmm
narG_TIGR01580.hmm
narH_TIGR01660.hmm
ndma_methanol_dehydrogenase_TIGR04266.hmm
nifD_TIGR01282.hmm
nifH_TIGR01287.hmm
nifK_TIGR01286.hmm
nirB_TIGR02374.hmm
nirD_TIGR02378.hmm
nirK_TIGR02376.hmm
nitric_oxide_reductase_norB.hmm
nitric_oxide_reductase_norC.hmm
nitrite_oxidoreductase_nxrA.hmm
nitrite_oxidoreductase_nxrB.hmm
nitrite_reductase_nirS.hmm
nosD_TIGR04247.hmm
nosZ_TIGR04246.hmm
nrfA_TIGR03152.hmm
nrfH_TIGR03153.hmm
qoxA_TIGR01432.hmm
rubisco_form_I.hmm
rubisco_form_II.hmm
rubisco_form_III.hmm
rubisco_form_II_III.hmm
rubisco_form_IV.hmm
sat_TIGR00339.hmm
sfh_TIGR02821.hmm
sgdh_TIGR02818.hmm
smdh_TIGR03451.hmm
soxB_TIGR04486.hmm
soxC_TIGR04555.hmm
soxY_TIGR04488.hmm
sulfide_quinone_oxidoreductase_sqr.hmm
sulfur_dioxygenase_sdo.hmm
thiosulfate_reductase_phsA.hmm

It's just a list of all the HMMs that are in curated_markers/metabolic_markers/

elizabethmcd commented 4 years ago

Yes these look fine. Are there any results in your outdir folder, such as the CSV of the HMM stats?

sarah872 commented 4 years ago

Here are the first two lines of the files in outdir/results

==> cleaned-matrix.csv <==
genome,acetate_citrate_lyase_aclA,acetate_citrate_lyase_aclB,aprA_TIGR02061,carbon_monoxide_dehydrogenase_coxL_TIGR02416,carbon_monoxide_dehydrogenase_coxM,carbon_monoxide_dehydrogenase_coxS,ccoN_TIGR00780,ccoO_TIGR00781,ccoP_TIGR00782,codh_catalytic_TIGR01702,codhC_TIGR00316,codhD_TIGR00381,coxA_TIGR02891,coxB_TIGR02866,cydA_PF01654,cydB_TIGR00203,cyoA_TIGR01433,cyoD_TIGR02847,cyoE_TIGR01473,dsrA_TIGR02064,dsrB_TIGR02066,dsrD_PF08679,fae_TIGR03126,fccB_PF09242,fdhA_TIGR01591,fdhB_TIGR01582,fdhC_TIGR01583,fdh_thiol_id_TIGR02819,FeFeHydrogenase_TIGR02512,FeFeHydrogenase_TIGR04105,fmtf_TIGR03119,hydrazine_oxidase_hzoA,hydrazine_synthase_hzsA,Hydrogenase_Group_1,Hydrogenase_Group_2a,Hydrogenase_Group_2b,Hydrogenase_Group_3a,Hydrogenase_Group_3b,Hydrogenase_Group_3c,Hydrogenase_Group_3d,Hydrogenase_Group_4,madA_TIGR02659,madB_TIGR02658,mtmc_TIGR03120,napA_TIGR01706,napB_PF03892,narG_TIGR01580,narH_TIGR01660,ndma_methanol_dehydrogenase_TIGR04266,nifD_TIGR01282,nifH_TIGR01287,nifK_TIGR01286,nirB_TIGR02374,nirD_TIGR02378,nirK_TIGR02376,nitric_oxide_reductase_norB,nitric_oxide_reductase_norC,nitrite_oxidoreductase_nxrA,nitrite_oxidoreductase_nxrB,nitrite_reductase_nirS,nosD_TIGR04247,nosZ_TIGR04246,nrfA_TIGR03152,nrfH_TIGR03153,qoxA_TIGR01432,rubisco_form_I,rubisco_form_II,rubisco_form_III,rubisco_form_II_III,rubisco_form_IV,sat_TIGR00339,sfh_TIGR02821,sgdh_TIGR02818,smdh_TIGR03451,soxB_TIGR04486,soxC_TIGR04555,soxY_TIGR04488,sulfide_quinone_oxidoreductase_sqr,sulfur_dioxygenase_sdo,thiosulfate_reductase_phsA
GCA_001766905.1_ASM176690v1_genomic,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0

==> custom-markers-results.csv <==
,acetate_citrate_lyase_aclA,acetate_citrate_lyase_aclB,aprA_TIGR02061,carbon_monoxide_dehydrogenase_coxL_TIGR02416,carbon_monoxide_dehydrogenase_coxM,carbon_monoxide_dehydrogenase_coxS,ccoN_TIGR00780,ccoO_TIGR00781,ccoP_TIGR00782,codh_catalytic_TIGR01702,codhC_TIGR00316,codhD_TIGR00381,coxA_TIGR02891,coxB_TIGR02866,cydA_PF01654,cydB_TIGR00203,cyoA_TIGR01433,cyoD_TIGR02847,cyoE_TIGR01473,dsrA_TIGR02064,dsrB_TIGR02066,dsrD_PF08679,fae_TIGR03126,fccB_PF09242,fdhA_TIGR01591,fdhB_TIGR01582,fdhC_TIGR01583,fdh_thiol_id_TIGR02819,FeFeHydrogenase_TIGR02512,FeFeHydrogenase_TIGR04105,fmtf_TIGR03119,hydrazine_oxidase_hzoA,hydrazine_synthase_hzsA,Hydrogenase_Group_1,Hydrogenase_Group_2a,Hydrogenase_Group_2b,Hydrogenase_Group_3a,Hydrogenase_Group_3b,Hydrogenase_Group_3c,Hydrogenase_Group_3d,Hydrogenase_Group_4,madA_TIGR02659,madB_TIGR02658,mtmc_TIGR03120,napA_TIGR01706,napB_PF03892,narG_TIGR01580,narH_TIGR01660,ndma_methanol_dehydrogenase_TIGR04266,nifD_TIGR01282,nifH_TIGR01287,nifK_TIGR01286,nirB_TIGR02374,nirD_TIGR02378,nirK_TIGR02376,nitric_oxide_reductase_norB,nitric_oxide_reductase_norC,nitrite_oxidoreductase_nxrA,nitrite_oxidoreductase_nxrB,nitrite_reductase_nirS,nosD_TIGR04247,nosZ_TIGR04246,nrfA_TIGR03152,nrfH_TIGR03153,qoxA_TIGR01432,rubisco_form_I,rubisco_form_II,rubisco_form_III,rubisco_form_II_III,rubisco_form_IV,sat_TIGR00339,sfh_TIGR02821,sgdh_TIGR02818,smdh_TIGR03451,soxB_TIGR04486,soxC_TIGR04555,soxY_TIGR04488,sulfide_quinone_oxidoreductase_sqr,sulfur_dioxygenase_sdo,thiosulfate_reductase_phsA
GCA_001766905.1_ASM176690v1_genomic,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
elizabethmcd commented 4 years ago

Ok so it's running the HMMs correctly as least for the search-custom-markers workflow. A lot of those errors above look like seaborn, numpy, or matplotlib package errors. Can you make sure all of those are installed and what versions there are?

sarah872 commented 4 years ago

seaborn==0.9.0 numpy==1.18.0 numpydoc==0.9.1 matplotlib==3.1.2

elizabethmcd commented 4 years ago

I just pushed a new version to PyPi and you can upgrade with python3 -m pip install metabolisHMM --upgrade. I still have a feeling this might be something weird with seaborn, however do you also have pandas installed, and what version?

sarah872 commented 4 years ago

I ran it with the upgraded version, but I am getting the same error:

#############################################
metabolisHMM v2.1
Reformatting fasta files...
Running HMM searches using custom marker set...

Parsing all results...
/home/user/py3-venv/bin/search-custom-markers:204: DeprecationWarning: 'U' mode is deprecated
  with open(result, "rU") as input:
Plotting results...
Traceback (most recent call last):
  File "/home/user/py3-venv/bin/search-custom-markers", line 307, in <module>
    plot=sns.heatmap(agg, cmap="viridis",xticklabels=xticks, square=True, linewidths=1, linecolor='black', cbar=True, cbar_kws={"shrink": .50})
  File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 517, in heatmap
    yticklabels, mask)
  File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 167, in __init__
    cmap, center, robust)
  File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 206, in _determine_cmap_params
    vmin = np.percentile(calc_data, 2) if robust else calc_data.min()
  File "/home/user/py3-venv/lib/python3.7/site-packages/numpy/core/_methods.py", line 34, in _amin
    return umr_minimum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation minimum which has no identity

my pandas version is pandas==0.25.3

morgvevans commented 4 years ago

I am having the same issue as sarah872 (the initial"The directory of curated metabolic markers could not be found." issue). I installed via conda. Looking forward to the update and using this awesome workflow!

elizabethmcd commented 4 years ago

@morgvevans Can you install version 2.1 with python3 -m pip install metabolisHMM --upgrade as this issue has been fixed in the new version

elizabethmcd commented 4 years ago

@sarah872 if you turn the aggregate option OFF, what happens?

sarah872 commented 4 years ago

Plotting works then turned off! Although the labels are a little shifted... custom-markers-results-heatmap.pdf

cleaned-matrix.txt custom-markers-results.txt

morgvevans commented 4 years ago

I got this working -- thanks so much for the assistance!

elizabethmcd commented 4 years ago

@sarah872 I will try to have a fix for the aggregating option soon. For the shifting of labels, I think the plotting functions are a little manual when trying to format label sizes. I can see if there is a fix for this as well, but the formatting may only work when you have smaller numbers of HMMs to run.

elizabethmcd commented 4 years ago

@sarah872 I have been unable to reproduce your error yet. I ran this command:

search-custom-markers --input ../genomes/ --output TEST1 --metadata ../groups.csv --markers_dir ../test_markers/ --markers_list ../markers_list.txt --aggregate ON

Where my groups.csv file looks like:

bacteria00190,Actinobacteria
bacteria00193,Deltaproteobacteria
bacteria00203,Bacteroidetes
bacteria00229,Deltaproteobacteria
bacteria01060v2,Deltaproteobacteria
bacteria23257,Deltaproteobacteria
bacteria23258,Bacteroidetes
bacteria23259,Chloroflexi
bacteria23260,Deltaproteobacteria
bacteria23263,Deltaproteobacteria
bacteria23265,Deltaproteobacteria
bacteria23266,Firmicutes
bacteria23267,Deltaproteobacteria
bacteria23268,Deltaproteobacteria
bacteria23272,Deltaproteobacteria
bacteria23311,Deltaproteobacteria
bacteria23313,Other
bacteria23314,Other
bacteria23315,Other
bacteria23317,Other
bacteria30001,Actinobacteria
bacteria30002,Actinobacteria
bacteria30003,Bacteroidetes
bacteria30004,Bacteroidetes
bacteria30005,Bacteroidetes
bacteria30006,Bacteroidetes
bacteria30007,Chloroflexi
bacteria30008,Firmicutes
bacteria30010,Firmicutes
bacteria30011,Firmicutes
bacteria30012,Firmicutes
bacteria30013,Deltaproteobacteria
bacteria30014,Deltaproteobacteria
bacteria30015,Deltaproteobacteria
bacteria30016,Deltaproteobacteria
bacteria30017,Other
bacteria30018,Deltaproteobacteria
bacteria30019,Deltaproteobacteria
bacteria30020,Other
bacteria30021,Other
bacteria30023,PVC
bacteria30024,Deltaproteobacteria
bacteria30025,Deltaproteobacteria
bacteria30026,PVC
bacteria30027,PVC
bacteria30028,PVC
bacteria30029,PVC
bacteria30030,PVC
bacteria30031,PVC
bacteria30032,PVC
bacteria30033,PVC
bacteria30034,PVC
bacteria30035,PVC
bacteria30036,PVC

And my markers_list.txt file looks like:

napA_TIGR01706.hmm
napB_PF03892.hmm
narG_TIGR01580.hmm
narH_TIGR01660.hmm
ndma_methanol_dehydrogenase_TIGR04266.hmm
nifD_TIGR01282.hmm
nifH_TIGR01287.hmm
nifK_TIGR01286.hmm
nirB_TIGR02374.hmm
nirD_TIGR02378.hmm
nirK_TIGR02376.hmm
nitric_oxide_reductase_norB.hmm
nitric_oxide_reductase_norC.hmm
nitrite_oxidoreductase_nxrA.hmm
nitrite_oxidoreductase_nxrB.hmm
nitrite_reductase_nirS.hmm
nosD_TIGR04247.hmm
nosZ_TIGR04246.hmm
nrfA_TIGR03152.hmm
nrfH_TIGR03153.hmm

I did take a look at your package version numbers, and this could possibly be causing the errors. I changed the package installation requirements for specific versions of the required dependencies. If you could run python3 -m pip uninstall metabolisHMM and then reinstall it, this could solve the versioning issues. If you do not want to affect your preexisting versions of other packages, put it in an environment.

Let me know if you have any questions or if this still doesn't work.

sarah872 commented 4 years ago

I first tried reinstalling, but I encountered the following error:

ValueError: numpy.ufunc has the wrong size, try recompiling. Expected 192, got 216

So I updated numpy with pip install numpy --upgrade, but then I got again the error when plotting (as above). Therefore I installed metabolisHMM in an environment, but again got the error with numpy:

Traceback (most recent call last):
  File "/scratch/metabolisHMM/installaion/bin/search-custom-markers", line 12, in <module>
    import pandas as pd
  File "/scratch/metabolisHMM/installaion/lib/python3.7/site-packages/pandas/__init__.py", line 26, in <module>
    from pandas._libs import (hashtable as _hashtable,
  File "/scratch/metabolisHMM/installaion/lib/python3.7/site-packages/pandas/_libs/__init__.py", line 4, in <module>
    from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
  File "__init__.pxd", line 872, in init pandas._libs.tslib
ValueError: numpy.ufunc has the wrong size, try recompiling. Expected 192, got 216

There seems to be an error with the installation of numpy??

Also, could you provide all example files you used here?

elizabethmcd commented 4 years ago

Can you provide the versions you had previously of the following packages:

I'll try to make an environment with your versions of the packages and see if I can reproduce the error that way.

The only other files I provided in the example above are protein fasta files of the genomes. The comment above contains the complete contents of my groups.csv metadata file and the markers list, which are the markers from the curated markers dataset.

elizabethmcd commented 4 years ago

Hi @sarah872 and @morgvevans. I'm planning some bug fixes for next week and I wanted to see if the above issue was still a problem? I'll also be trying to make a conda release and therefore consolidating some workflows to make things simpler. Any suggestions about things to improve are welcome! Thanks.

srisvs33 commented 2 years ago

Hi guys, I am also getting a similar error with my MAG dataset. Any idea how to solve them. I have attached s/w version and dependency s/w version along with the error message.

Many thanks in advance

Venkat

Error_report.txt

elizabethmcd commented 2 years ago

Hi @srisvs33 - it looks like a few of your packages are a few versions newer of the versions that work with the workflows, namely seaborn and matplotlib. When running this command, do you still get results? If you turn the aggregate option OFF as suggested above, does the workflow still work? If so, this might be a numpy package version problem that I will need to resolve.