KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

FileNotFoundError: [Errno 2] No such file or directory: '/databases/autometa/ncbi/delnodes.dmp' #282

Closed JoshuaTCooper closed 2 years ago

JoshuaTCooper commented 2 years ago

Hi,

I'm looking forward to running this software on my metagenome contigs, however while following along with the tutorial I came across an error, that I'm not sure how to resolve. Also I hope did this issue submission, correctly. So apologies if I messed it up!

Thank you in advance!

Josh

Current Behavior

Running the autometa-taxonomy-lca script produces an error looking for a non-existent ncbi dmp file called delnodes.dmp. I tracked the Blame within ncbi.py, and it looks like an addition/edit 3 months ago dealing with unclassified sequences.

I setup the ncbi database using autometa-update-databases --update-ncbi, and there is no delnodes.dmp file after its completion of the diamond blast database.

Is there a step I'm missing to create that file?

Steps to Reproduce

autometa-taxonomy-lca --blast autometa/blastp.tsv --dbdir databases/autometa/ncbi/ --lca-output autometa/lca.tsv --sseqid2taxid-output autometa/lca.sseqid2taxid.tsv --lca-error-taxids autometa/lca.errorTaxids.tsv

  Traceback (most recent call last):
  File "/home/cooperjo/miniconda3/envs/autometa2/bin/autometa-taxonomy-lca", line 10, in <module>
    sys.exit(main())
  File "/home/cooperjo/miniconda3/envs/autometa2/lib/python3.9/site-packages/autometa/taxonomy/lca.py", line 777, in main
    lca = LCA(dbdir=args.dbdir, verbose=args.verbose, cache=args.cache)
  File "/home/cooperjo/miniconda3/envs/autometa2/lib/python3.9/site-packages/autometa/taxonomy/lca.py", line 78, in __init__
    super().__init__(dbdir, verbose=verbose)
  File "/home/cooperjo/miniconda3/envs/autometa2/lib/python3.9/site-packages/autometa/taxonomy/ncbi.py", line 109, in __init__
    self.delnodes = self.parse_delnodes()
  File "/home/cooperjo/miniconda3/envs/autometa2/lib/python3.9/site-packages/autometa/taxonomy/ncbi.py", line 399, in parse_delnodes
    fh = open(self.delnodes_fpath)
FileNotFoundError: [Errno 2] No such file or directory: '/home/cooperjo/databases/autometa/ncbi/delnodes.dmp'

Expected Behavior

Should have produced the following files: blastp.tsv lca.tsv lca.sseqid2taxid.tsv lca.errorTaxids.tsv

Environment Information

autometa-config --print

section option  value
common  home_dir    /home/cooperjo/miniconda3/envs/autometa2/lib/python3.9/site-packages
environ diamond /home/cooperjo/miniconda3/envs/autometa2/bin/diamond
environ hmmsearch   /home/cooperjo/miniconda3/envs/autometa2/bin/hmmsearch
environ hmmpress    /home/cooperjo/miniconda3/envs/autometa2/bin/hmmpress
environ hmmscan /home/cooperjo/miniconda3/envs/autometa2/bin/hmmscan
environ prodigal    /home/cooperjo/miniconda3/envs/autometa2/bin/prodigal
environ bowtie2 /home/cooperjo/miniconda3/envs/autometa2/bin/bowtie2
environ samtools    /home/cooperjo/miniconda3/envs/autometa2/bin/samtools
environ bedtools    /home/cooperjo/miniconda3/envs/autometa2/bin/bedtools
versions    diamond 2.0.15
versions    hmmsearch   3.3.2
versions    hmmpress    3.3.2
versions    hmmscan 3.3.2
versions    prodigal    2.6.3
versions    bowtie2 2.2.5
versions    samtools    1.11
versions    bedtools    2.30.0
databases   base    /home/cooperjo/miniconda3/envs/autometa2/lib/python3.9/site-packages/autometa/databases
databases   ncbi    /home/cooperjo/databases/autometa/ncbi
databases   markers /home/cooperjo/databases/autometa/databases/markers
database_urls   taxdump ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
database_urls   accession2taxid ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz
database_urls   nr  ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz
database_urls   bacteria_single_copy    https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.hmm
database_urls   bacteria_single_copy_cutoffs    https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.cutoffs
database_urls   archaea_single_copy https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.hmm
database_urls   archaea_single_copy_cutoffs https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.cutoffs
checksums   taxdump ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz.md5
checksums   accession2taxid ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz.md5
checksums   nr  ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz.md5
checksums   bacteria_single_copy    https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.hmm.md5
checksums   bacteria_single_copy_cutoffs    https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.cutoffs.md5
checksums   archaea_single_copy https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.hmm.md5
checksums   archaea_single_copy_cutoffs https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.cutoffs.md5
ncbi    host    ftp.ncbi.nlm.nih.gov
ncbi    taxdump /home/cooperjo/databases/autometa/ncbi/taxdump.tar.gz
ncbi    nodes   /home/cooperjo/databases/autometa/ncbi/nodes.dmp
ncbi    names   /home/cooperjo/databases/autometa/ncbi/names.dmp
ncbi    merged  /home/cooperjo/databases/autometa/ncbi/merged.dmp
ncbi    accession2taxid /home/cooperjo/databases/autometa/ncbi/prot.accession2taxid.gz
ncbi    nr  /home/cooperjo/databases/autometa/ncbi/nr.gz
markers host    raw.githubusercontent.com
markers bacteria_single_copy    /home/cooperjo/databases/autometa/databases/markers/bacteria.single_copy.hmm
markers bacteria_single_copy_cutoffs    /home/cooperjo/databases/autometa/databases/markers/bacteria.single_copy.cutoffs
markers archaea_single_copy /home/cooperjo/databases/autometa/databases/markers/archaea.single_copy.hmm
markers archaea_single_copy_cutoffs /home/cooperjo/databases/autometa/databases/markers/archaea.single_copy.cutoffs
files   metagenome  metagenome.fna
files   fwd_reads   fwd_reads.fastq
files   rev_reads   rev_reads.fastq
files   se_reads    se_reads.fastq
files   sam alignments.sam
files   bam alignments.bam
files   lengths lengths.tsv
files   bed alignments.bed
files   length_filtered metagenome.filtered.fna
files   coverages   coverages.tsv
files   kmer_counts kmers.tsv
files   kmer_normalized kmers.normalized.tsv
files   kmer_embedded   kmers.embedded.tsv
files   nucleotide_orfs metagenome.filtered.orfs.fna
files   amino_acid_orfs metagenome.filtered.orfs.faa
files   blastp  blastp.tsv
files   blastp_hits blastp.hits.pkl.gz
files   lca lca.tsv
files   blastx  blastx.tsv
files   taxonomy    taxonomy.tsv
files   bacteria_hmmscan    bacteria.hmmscan.tsv
files   bacteria_markers    bacteria.markers.tsv
files   archaea_hmmscan archaea.hmmscan.tsv
files   archaea_markers archaea.markers.tsv
files   bacteria_binning    bacteria.binning.tsv
files   archaea_binning archaea.binning.tsv
files   checkpoints checkpoints.tsv
JoshuaTCooper commented 2 years ago

Apologies! After I submitted the issue, I figured out the problem. I had to tar -zxvf the taxdump.tar.gz archive and it populated all the files. Sorry for the confusion.