KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

The Bash tutorial is missing a step #365

Open lyisrae1 opened 1 month ago

lyisrae1 commented 1 month ago
### User checklist - [x] Are you using the latest release? Yes - [x] Are you using python 3? Yes - [x] Did you check previous issues to see if this has already been mentioned? Yes - [x] Are you using a Mac or Linux machine? Linux machine #### Description Hello there, I am trying to learn how to use autometa from your tutorial posted on ReadTheDocs, but there is a piece missing in Step 4 - Single Copy Markers. There is not a step detailing how we create a hmmscan.tsv file. Can you provide this information to me please? #### Expected Behavior I checked the rest of the document, but there is no other mention of how the learners are supposed to make the hmmscan.tsv file. #### System Environment

Tasks/Command(s)

Log/Error information generated by Autometa.

Hello, I appreciate you looking at my inquiry. I noticed that there was a step missing in your ReadTheDocs page for the tutorial. There is not step given to show us how to create hmmscan.tsv files before we need them to complete Step 4 - Single Copy Markers. For example, I followed the tutorial exactly, but I keep getting an error telling me that the hmmscan.tsv file does not exist. I will past the directions for Step 4 here: # Create a markers directory to hold the marker genes mkdir -p $HOME/Autometa/autometa/databases/markers # Change the default download path to the directory created above autometa-config \ --section databases \ --option markers \ --value $HOME/Autometa/autometa/databases/markers # Download single-copy marker genes autometa-update-databases --update-markers # hmmpress the marker genes hmmpress -f $HOME/Autometa/autometa/databases/markers/bacteria.single_copy.hmm hmmpress -f $HOME/Autometa/autometa/databases/markers/archaea.single_copy.hmm autometa-markers \ --orfs $HOME/tutorial/78mbp_metagenome.orfs.faa \ --kingdom bacteria \ --hmmscan $HOME/tutorial/78mbp_metagenome.hmmscan.tsv \ --out $HOME/tutorial/78mbp_metagenome.markers.tsv \ --parallel \ --cpus 4 \ --seed 42 When I follow this code, I get this error: ERROR: [10/23/2024 04:39:10 PM DEBUG] autometa.common.external.hmmscan: hmmscan --seed 42 --cpu 0 --tblout /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.hmmscan.tsv /vast/agnanad1/Leone/autometa_tutorial/markers/bacteria.single_copy.hmm /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.orfs.faa [10/23/2024 04:39:10 PM WARNING] autometa.common.external.hmmscan: Make sure your hmm profiles are pressed! hmmpress -f /vast/agnanad1/Leone/autometa_tutorial/markers/bacteria.single_copy.hmm Traceback (most recent call last): File "/home/lyisrae1/.conda/envs/autometa/bin/autometa-markers", line 10, in sys.exit(main()) ^^^^^^ File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/site-packages/autometa/common/markers.py", line 266, in main get( File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/site-packages/autometa/common/markers.py", line 162, in get scans = hmmscan.run( ^^^^^^^^^^^^ File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/site-packages/autometa/common/external/hmmscan.py", line 174, in run annotate_sequential( File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/site-packages/autometa/common/external/hmmscan.py", line 106, in annotate_sequential raise err File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/site-packages/autometa/common/external/hmmscan.py", line 101, in annotate_sequential subprocess.run( File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['hmmscan', '--seed', '42', '--cpu', '0', '--tblout', '/vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.hmmscan.tsv', Additionally, I've have had a few syntax issues in Step 5 - Taxonomy. But those were very easy to fix, so that is not the issue. But can I please get some clarification to finish out Step 4 on ReadTheDocs please? I cannot finish the tutorial properly without that step. [autometa_tutorial.txt](https://github.com/user-attachments/files/17498392/autometa_tutorial.txt) Here is the process I did without the markers: autometa-binning \ --kmers /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.bacteria.kmers.embedded.tsv \ --coverages /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.coverages.tsv \ --gc-content /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.gc_content.tsv \ --output-binning /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.binning.tsv \ --output-main /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.main.tsv \ --clustering-method dbscan \ --completeness 20 \ --purity 90 \ --cov-stddev-limit 25 \ --gc-stddev-limit 5 \ --taxonomy /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.taxonomy.tsv \ --starting-rank superkingdom \ --rank-filter superkingdom \ --rank-name-filter bacteria And here is the error message: usage: autometa-binning [-h] --kmers filepath --coverages filepath --gc-content filepath --markers filepath --output-binning filepath [--output-main filepath] [--clustering-method {dbscan,hdbscan}] [--completeness 0 < float <= 100] [--purity 0 < float <= 100] [--cov-stddev-limit float] [--gc-stddev-limit float] [--taxonomy filepath] [--starting-rank {superkingdom,phylum,class,order,family,genus,species}] [--reverse-ranks] [--rank-filter {superkingdom,phylum,class,order,family,genus,species}] [--rank-name-filter RANK_NAME_FILTER] [--verbose] [--cpus int] autometa-binning: error: the following arguments are required: --markers https://autometa.readthedocs.io/en/latest/bash-step-by-step-tutorial.html#single-copy-markers Thank you for your time, Leone

chasemc commented 1 month ago

It looks the documentation needs to be fixed but is mostly an issue with file paths

1

At the start it says to download metagenome.fna.gz to $HOME/tutorial/test_data/ but later the file has a different name $HOME/tutorial/test_data/78mbp_metagenome.fna

So to start you should download the metagenome.fna.gz and save it to/as $HOME/tutorial/test_data/78mbp_metagenome.fna

2

There is a separate issue in the ORF creation step.

Current:

autometa-orfs \
    --assembly $HOME/tutorial/78mbp_metagenome.filtered.fna \
    --output-nucls $HOME/tutorial/78mbp_metagenome.orfs.fna \
    --output-prots $HOME/tutorial/a78mbp_metagenome.orfs.faa \
    --cpus 40

Should be:

autometa-orfs \
    --assembly $HOME/tutorial/78mbp_metagenome.filtered.fna \
    --output-nucls $HOME/tutorial/78mbp_metagenome.orfs.fna \
    --output-prots $HOME/tutorial/78mbp_metagenome.orfs.faa \
    --cpus 40

That should fix the error.


CC- @shaneroesemann @jason-c-kwan , the documentation needs to be updated accordingly. Also the error message generated by autometa-markers is not helpful, the subprocess stderr should be captured and printed rather than just saying there's an error with hmmpress and "Make sure your hmm profiles are pressed! " which wasn't the issue