KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

autometa-update-databases, error building GTDB diamond database #328

Closed Kohtzanth closed 10 months ago

Kohtzanth commented 1 year ago

Current Behavior

The GTDB gtdb_proteins_aa_reps.tar.gz and taxdump files download correctly, but an error occurs when trying to build the diamond database. After the error there is an empty gtdb.faa file, so I think the files are not being merged after the extraction step. Could you help me find a workaround for this? I'm unsure how the files are processed after extracting, prior to calling diamond makedb.

Steps to Reproduce

autometa-setup-gtdb --reps-faa mambaforge/envs/autometa/database/gtdb_proteins_aa_reps.tar.gz --dbdir mambaforge/envs/autometa/database --cpus 80
[06/14/2023 07:52:49 PM DEBUG] autometa.taxonomy.gtdb: Extracting tarball containing GTDB ref genome animo acid data sequences to: mambaforge/envs/autometa/database/protein_faa_reps
[06/14/2023 07:56:26 PM DEBUG] autometa.taxonomy.gtdb: Extraction done.
[06/14/2023 07:56:26 PM DEBUG] autometa.taxonomy.gtdb: Merging 0 faa files.
[06/14/2023 07:56:26 PM DEBUG] autometa.taxonomy.gtdb: Combined GTDB faa file written to mambaforge/envs/autometa/database/gtdb.faa
[06/14/2023 07:56:27 PM DEBUG] autometa.common.external.diamond: diamond makedb --in mambaforge/envs/autometa/database/gtdb.faa --db mambaforge/envs/autometa/database/gtdb.dmnd -p 80
Traceback (most recent call last):
  File "/home/anthonyk/mambaforge/envs/autometa/bin/autometa-setup-gtdb", line 10, in <module>
    sys.exit(main())
  File "/home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages/autometa/taxonomy/gtdb.py", line 356, in main
    diamond.makedatabase(
  File "/home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages/autometa/common/external/diamond.py", line 51, in makedatabase
    subprocess.run(
  File "/home/anthonyk/mambaforge/envs/autometa/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['diamond', 'makedb', '--in', 'mambaforge/envs/autometa/database/gtdb.faa', '--db', 'mambaforge/envs/autometa/database/gtdb.dmnd', '-p', '80']' returned non-zero exit status 1.

Expected Behavior

The gtdb_reps.tar.gz to be formatted into a diamond database.

Environment Information

autometa-config --print

```bash [06/14/2023 08:00:56 PM DEBUG] root: environment dependencies satisifed: True section option value common home_dir /home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages environ diamond /home/anthonyk/mambaforge/envs/autometa/bin/diamond environ hmmsearch /home/anthonyk/mambaforge/envs/autometa/bin/hmmsearch environ hmmpress /home/anthonyk/mambaforge/envs/autometa/bin/hmmpress environ hmmscan /home/anthonyk/mambaforge/envs/autometa/bin/hmmscan environ prodigal /home/anthonyk/mambaforge/envs/autometa/bin/prodigal environ bowtie2 /home/anthonyk/mambaforge/envs/autometa/bin/bowtie2 environ samtools /home/anthonyk/mambaforge/envs/autometa/bin/samtools environ bedtools /home/anthonyk/mambaforge/envs/autometa/bin/bedtools versions diamond 2.1.7 versions hmmsearch 3.3.2 versions hmmpress 3.3.2 versions hmmscan 3.3.2 versions prodigal 2.6.3 versions bowtie2 2.5.1 versions samtools 1.17 versions bedtools 2.31.0 databases base /home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages/autometa/databases databases ncbi /home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages/autometa/databases/ncbi databases gtdb mambaforge/envs/autometa/database databases markers mambaforge/envs/autometa/markers database_urls taxdump ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz database_urls accession2taxid ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz database_urls nr ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz database_urls bacteria_single_copy https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.hmm database_urls bacteria_single_copy_cutoffs https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.cutoffs database_urls archaea_single_copy https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.hmm database_urls archaea_single_copy_cutoffs https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.cutoffs database_urls proteins_aa_reps https://data.gtdb.ecogenomic.org/releases/latest/genomic_files_reps/gtdb_proteins_aa_reps.tar.gz database_urls gtdb_taxdmp https://github.com/shenwei356/gtdb-taxdump/releases/latest/download/gtdb-taxdump.tar.gz checksums taxdump ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz.md5 checksums accession2taxid ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz.md5 checksums nr ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz.md5 checksums bacteria_single_copy https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.hmm.md5 checksums bacteria_single_copy_cutoffs https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.cutoffs.md5 checksums archaea_single_copy https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.hmm.md5 checksums archaea_single_copy_cutoffs https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.cutoffs.md5 ncbi host ftp.ncbi.nlm.nih.gov ncbi taxdump /home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages/autometa/databases/ncbi/taxdump.tar.gz ncbi nodes /home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages/autometa/databases/ncbi/nodes.dmp ncbi names /home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages/autometa/databases/ncbi/names.dmp ncbi merged /home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages/autometa/databases/ncbi/merged.dmp ncbi delnodes /home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages/autometa/databases/ncbi/delnodes.dmp ncbi accession2taxid /home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages/autometa/databases/ncbi/prot.accession2taxid.gz ncbi nr /home/anthonyk/mambaforge/envs/autometa/lib/python3.9/site-packages/autometa/databases/ncbi/nr.gz gtdb host data.gtdb.ecogenomic.org gtdb release latest gtdb proteins_aa_reps mambaforge/envs/autometa/database/gtdb_proteins_aa_reps.tar.gz gtdb gtdb_taxdmp mambaforge/envs/autometa/database/gtdb-taxdump.tar.gz markers host raw.githubusercontent.com markers bacteria_single_copy mambaforge/envs/autometa/markers/bacteria.single_copy.hmm markers bacteria_single_copy_cutoffs mambaforge/envs/autometa/markers/bacteria.single_copy.cutoffs markers archaea_single_copy mambaforge/envs/autometa/markers/archaea.single_copy.hmm markers archaea_single_copy_cutoffs mambaforge/envs/autometa/markers/archaea.single_copy.cutoffs files metagenome metagenome.fna files fwd_reads fwd_reads.fastq files rev_reads rev_reads.fastq files se_reads se_reads.fastq files sam alignments.sam files bam alignments.bam files lengths lengths.tsv files bed alignments.bed files length_filtered metagenome.filtered.fna files coverages coverages.tsv files kmer_counts kmers.tsv files kmer_normalized kmers.normalized.tsv files kmer_embedded kmers.embedded.tsv files nucleotide_orfs metagenome.filtered.orfs.fna files amino_acid_orfs metagenome.filtered.orfs.faa files blastp blastp.tsv files blastp_hits blastp.hits.pkl.gz files lca lca.tsv files blastx blastx.tsv files taxonomy taxonomy.tsv files bacteria_hmmscan bacteria.hmmscan.tsv files bacteria_markers bacteria.markers.tsv files archaea_hmmscan archaea.hmmscan.tsv files archaea_markers archaea.markers.tsv files bacteria_binning bacteria.binning.tsv files archaea_binning archaea.binning.tsv files checkpoints checkpoints.tsv ```

Run Information

contents of nf-params.json

```bash ```

contents of .nextflow.log

```bash ```

Additional information

```bash ```

Sidduppal commented 1 year ago

Hey @Kohtzanth , thanks for using Autometa. The issue seems to stem from the recent update in the GTDB database. We are looking into it and have created a PR to fix it. I will update you when the PR is merged.

Kohtzanth commented 1 year ago

Sounds good, thank you @Sidduppal!