Problem when running Autometa using bash script

Valentin-Bio-zz commented 2 years ago

Hello developer, I'm trying to run Autometa.sh bash script on a HPC server.

I made a coassembly of environmental paired reads libraries, produced a bam file per each sequencing set, merged those bam files and prepared each one of the needed files to run Autometa.sh as described here: https://autometa.readthedocs.io/en/latest/bash-workflow.html#ncbi-preparation

Here I attach the configuration of my autometa.sh script (in txt format for uploading purposes) autometa.txt

and the stdout of Autometa-config --print

config.txt

I installed autometa via the conda recipe: conda create -n autometa autometa with conda version 22.9.0

on the output directory I have the following files:

SLF_CL1.coverages.bed.tsv SLF_CL1.coverages.tsv SLF_CL1.filtered.fna SLF_CL1.gc_content.tsv. (if these files are required I can provide them)

Hope the provided info may help to solve this error.

Thanks,

Valentín.

evanroyrees commented 2 years ago

Hi @Valentin-Bio, what is the error you received? Would you please share the log files associated with your autometa run?

Valentin-Bio-zz commented 2 years ago

hello Evan, here I attach the error error.txt

evanroyrees commented 2 years ago

It looks like you may possibly have three issues:

Are the marker databases pressed?

[11/15/2022 03:48:41 PM DEBUG] autometa.common.external.hmmscan: hmmscan --seed 42 --cpu 4 --tblout /lustre/groups/cbi/Users/ecastron/valentin/sea_lions/filtered/no_host/cluster1/autometa_output2/SLF_CL1.bacteria.hmmscan.tsv /GWSPH/home/ecastron/miniconda3/envs/autometa/lib/python3.9/site-packages/autometa/databases/markers/bacteria.single_copy.hmm /lustre/groups/cbi/Users/ecastron/valentin/sea_lions/filtered/no_host/cluster1/autometa/metagenome.orfs.faa
[11/15/2022 03:48:41 PM WARNING] autometa.common.external.hmmscan: Make sure your hmm profiles are pressed! hmmpress -f /GWSPH/home/ecastron/miniconda3/envs/autometa/lib/python3.9/site-packages/autometa/databases/markers/bacteria.single_copy.hmm

Do you have the marker databases setup? Does the compute environment have access to the databases directory?

/GWSPH/home/ecastron/miniconda3/envs/autometa/lib/python3.9/site-packages/autometa/databases/markers/

Otherwise, you can update the marker databases using the autometa-update-databases command. For example,

# Update markers and hmmpress at current config location
autometa-update-databases --update-markers

NOTE: This would install the databases to GWSPH/home/ecastron/miniconda3/envs/autometa/lib/python3.9/site-packages/autometa/databases/markers/

If you want to install the markers databases to a different location, you can first run autometa-config then the update command.

autometa-config \
    --section databases \
    --option markers \
    --value </path/to/your/marker/databases/directory>
autometa-update-databases --update-markers

You may be missing some of the NCBI database files

FileNotFoundError: [Errno 2] No such file or directory: '/GWSPH/home/ecastron/Databases/autometa_db/delnodes.dmp'

Do you have merged.dmp and delnodes.dmp under /GWSPH/home/ecastron/Databases/autometa_db/?

You may have a dependency conflict.

Traceback (most recent call last):
  File "/GWSPH/home/ecastron/miniconda3/envs/autometa/bin/autometa-binning", line 7, in <module>
    from autometa.binning.recursive_dbscan import main
  File "/GWSPH/home/ecastron/miniconda3/envs/autometa/lib/python3.9/site-packages/autometa/binning/recursive_dbscan.py", line 20, in <module>
    from hdbscan import HDBSCAN
  File "/GWSPH/home/ecastron/miniconda3/envs/autometa/lib/python3.9/site-packages/hdbscan/__init__.py", line 1, in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
  File "/GWSPH/home/ecastron/miniconda3/envs/autometa/lib/python3.9/site-packages/hdbscan/hdbscan_.py", line 509, in <module>
    memory=Memory(cachedir=None, verbose=0),
TypeError: __init__() got an unexpected keyword argument 'cachedir'

This has been previously noted first at #285 and resolved at #286 and again noted at #295. You should be able to resolve this error by pinning joblib and scipy.

Here's the command:

conda install -n autometa -c conda-forge joblib==1.1.0 scipy==1.8 -y

Valentin-Bio-zz commented 2 years ago

Hello Evan! Thanks for your answer,

I made the configuration of markers database, installed ithe database and now the files are on the database path.

about the second point, merged.dmp is present but delnodes.dmp is not. How can I deal with this? I was thinking on just creating an empty file called delnodes.dmp as this thread suggests: https://github.com/shenwei356/taxonkit/issues/27

about the third point, I'm trying to install the suggested versions with that code on the cluster.

best regards,

Valentín

evanroyrees commented 2 years ago

Hi Valentín,

Glad to hear you're all set up here
merged.dmp and delnodes.dmp are also contained within NCBI's taxdump tarball. You can download and extract these files with the following commands:

# First navigate to your ncbi database directory
cd /path/to/your/autometa/configured/ncbi/databases/directory
# Download tarball to current directory
wget ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
# extract files from tarball into current directory
tar -xvzf taxdump.tar.gz

NOTE: This will extract the following files from the tarball:
x citations.dmp
x delnodes.dmp
x division.dmp
x gencode.dmp
x merged.dmp
x names.dmp
x nodes.dmp
x gc.prt
x readme.txt

NCBI taxdump tarball: ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz

Were you able to update joblib and scipy with the pinned versions?

You should be able to test this by running autometa-binning -h. If you receive the help text without any errors, you should be good to go 👍

Valentin-Bio-zz commented 1 year ago

Hello Evan!, it worked I finally got may bins by doing your mentioned modifications.

Thanks a lot.

KwanLab / Autometa

Problem when running Autometa using bash script #297