Ecogenomics / CheckM

Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes
https://ecogenomics.github.io/CheckM/
GNU General Public License v3.0
334 stars 73 forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'checkm_folder/storage/tree/concatenated.tre' #372

Open spitfiredd opened 1 year ago

spitfiredd commented 1 year ago

I am running the following command

checkm lineage_wf -f "mysample_downsample10pct_metabat.checkm.txt" -t 40 -x fa metabat_folder checkm_folder

Here is my traceback:

 Traceback (most recent call last):
    File "/opt/conda/bin/checkm", line 856, in <module>
      checkmParser.parseOptions(args)
    File "/opt/conda/lib/python3.7/site-packages/checkm/main.py", line 980, in parseOptions
      self.lineageSet(options)
    File "/opt/conda/lib/python3.7/site-packages/checkm/main.py", line 265, in lineageSet
      resultsParser, options.unique, options.multi)
    File "/opt/conda/lib/python3.7/site-packages/checkm/treeParser.py", line 485, in getBinMarkerSets
      tree = dendropy.Tree.get_from_path(treeFile, schema='newick', rooting="force-rooted", preserve_underscores=True)
    File "/opt/conda/lib/python3.7/site-packages/dendropy/datamodel/basemodel.py", line 217, in get_from_path
      with open(src, *open_args) as fsrc:
  FileNotFoundError: [Errno 2] No such file or directory: 'checkm_folder/storage/tree/concatenated.tre'

When I look in the checkm_folder/storage/tree I see a concatenated.fasta file but not concatenated.tre

I am running this from: docker base image: mambaorg/micromamba:0.27.0 python: 3.7.16 nextflow: version 22.10.0 build 5826

spitfiredd commented 1 year ago

Here is the contents of pplacer out

Running pplacer v1.1.alpha19-0-g807f6f3 analysis on checkm_folder/storage/tree/concatenated.fasta...
Didn't find any reference sequences in given alignment file. Using supplied reference alignment.
Pre-masking sequences... sequence length cut from 6988 to 6808.
Warning: pplacer results make the most sense when the given tree is multifurcating at the root. See manual for details.
Determining figs... figs disabled.
Allocating memory for internal nodes... done.
Caching likelihood information on reference tree... 
spitfiredd commented 1 year ago

I'm not sure if this is relavent

https://github.com/Ecogenomics/GTDBTk/issues/170#issuecomment-514466145

spitfiredd commented 1 year ago

Update:

I am able to run pplacer independently of checkm after it fails.

Here are the steps I took:

pplacer command

docker run -v "$(pwd)":/scratch quay.io/biocontainers/checkm-genome:1.2.1--pyhdfd78af_0 pplacer -j 10 -c /usr/local/checkm_data/genome_tree/genome_tree_reduced.refpkg -o /scratch/concatenated.pplacer.json /scratch/concatenated.fasta

guppy command

docker run -v "$(pwd)":/scratch quay.io/biocontainers/checkm-genome:1.2.1--pyhdfd78af_0 pplacer guppy tog -o /scratch/concatenated.tre /scratch/concatenated.pplacer.json

The commands I got from (lines 72-81) https://github.com/Ecogenomics/CheckM/blob/3eac9581ed81fdeb4a13045b912d08d3326ffdc5/checkm/pplacer.py#L72

Louis-MG commented 1 year ago

I am having the same error with checkM 1.2.2 (in a singularity). The Dockerfile I intended to use (and push here):

FROM ubuntu:latest
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update \
    && apt install -y git python3-pip hmmer prodigal pplacer wget \
    && pip3 install numpy matplotlib pysam checkm-genome

RUN wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz && tar -zxvf checkm_data_2015_01_16.tar.gz 
RUN rm checkm_data_2015_01_16.tar.gz

ENV CHECKM_DATA_PATH=/
ENV PATH="$HOME/.local/bin:$PATH"
CMD checkm  

You can get if with docker pull 007ptar007/checkm:lastest Command I ran: singularity run --no-home --bind '/home/user/' checkm.sif checkm test ~/testcheckm/

While trying to look for the file ( if it was misplaced), I noticed that the output is appened to the log file instead of re-writing it, which can be confusing. Also, only the stdout is written and stderr is missing. The first error that happens is actually:

[2023-06-21 15:54:41] INFO: CheckM v1.2.2
[2023-06-21 15:54:41] INFO: checkm test /home/user/testcheckm/
[2023-06-21 15:54:41] INFO: CheckM data: /
[2023-06-21 15:54:41] INFO: [CheckM - Test] Processing E.coli K12-W3310 to verify operation of CheckM.
[2023-06-21 15:54:41] INFO: [Step 1]: Verifying tree command.
[2023-06-21 15:54:41] INFO: [CheckM - tree] Placing bins in reference genome tree.
[2023-06-21 15:54:41] INFO: Identifying marker genes in 1 bins with 1 threads:
    Finished processing 1 of 1 (100.00%) bins.
[2023-06-21 15:54:47] INFO: Saving HMM info to file.
[2023-06-21 15:54:47] INFO: Calculating genome statistics for 1 bins with 1 threads:
    Finished processing 1 of 1 (100.00%) bins.
[2023-06-21 15:54:47] INFO: Extracting marker genes to align.
[2023-06-21 15:54:47] INFO: Parsing HMM hits to marker genes:
    Finished parsing hits for 1 of 1 (100.00%) bins.
[2023-06-21 15:54:47] INFO: Extracting 43 HMMs with 1 threads:
    Finished extracting 43 of 43 (100.00%) HMMs.
[2023-06-21 15:54:47] INFO: Aligning 43 marker genes with 1 threads:
    Finished aligning 43 of 43 (100.00%) marker genes.
[2023-06-21 15:54:47] INFO: Reading marker alignment files.
[2023-06-21 15:54:47] INFO: Concatenating alignments.
[2023-06-21 15:54:47] INFO: Placing 1 bins into the genome tree with pplacer (be patient).
Killed
Uncaught exception: Sys_error("/home/user/testcheckm/results/storage/tree/concatenated.pplacer.json: No such file or directory")
Fatal error: exception Sys_error("/home/user/testcheckm/results/storage/tree/concatenated.pplacer.json: No such file or directory")
[2023-06-21 15:55:14] INFO: { Current stage: 0:00:33.283 || Total: 0:00:33.283 }
[2023-06-21 15:55:14] INFO: [Passed]
[2023-06-21 15:55:14] INFO: [Step 2]: Verifying tree_qa command.
[2023-06-21 15:55:14] INFO: [CheckM - tree_qa] Assessing phylogenetic markers found in each bin.
[2023-06-21 15:55:14] INFO: Reading HMM info from file.
[2023-06-21 15:55:14] INFO: Parsing HMM hits to marker genes:
    Finished parsing hits for 1 of 1 (100.00%) bins.

This exception is seen here but shows a defined error instead of a Killed and then:

[2023-06-21 15:55:14] INFO: [Passed]
[2023-06-21 15:55:14] INFO: [Step 2]: Verifying tree_qa command.
[2023-06-21 15:55:14] INFO: [CheckM - tree_qa] Assessing phylogenetic markers found in each bin.
[2023-06-21 15:55:14] INFO: Reading HMM info from file.
[2023-06-21 15:55:14] INFO: Parsing HMM hits to marker genes:
    Finished parsing hits for 1 of 1 (100.00%) bins.

Unexpected error: <class 'FileNotFoundError'>
Traceback (most recent call last):
  File "/usr/local/bin/checkm", line 856, in <module>
    checkmParser.parseOptions(args)
  File "/usr/local/lib/python3.10/dist-packages/checkm/main.py", line 1031, in parseOptions
    self.test(options)
  File "/usr/local/lib/python3.10/dist-packages/checkm/main.py", line 940, in test
    verifyEcoli.run(self, options.output_dir)
  File "/usr/local/lib/python3.10/dist-packages/checkm/test/test_ecoli.py", line 75, in run
    parser.treeQA(options)
  File "/usr/local/lib/python3.10/dist-packages/checkm/main.py", line 225, in treeQA
    treeParser.printSummary(
  File "/usr/local/lib/python3.10/dist-packages/checkm/treeParser.py", line 45, in printSummary
    self.reportBinTaxonomy(outDir, resultsParser, bTabTable, outFile, binStats, bLineageStatistics=False)
  File "/usr/local/lib/python3.10/dist-packages/checkm/treeParser.py", line 641, in reportBinTaxonomy
    binIdToTaxonomy = self.getBinTaxonomy(outDir, binIds)
  File "/usr/local/lib/python3.10/dist-packages/checkm/treeParser.py", line 191, in getBinTaxonomy
    tree = dendropy.Tree.get_from_path(treeFile, schema='newick', rooting="force-rooted", preserve_underscores=True)
  File "/usr/local/lib/python3.10/dist-packages/dendropy/datamodel/basemodel.py", line 217, in get_from_path
    with open(src, *open_args) as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/testcheckm/results/storage/tree/concatenated.tre'

I wonder if the error of @spitfiredd comes from the same killed job of pplacer or if our python error comes from a different previous bug. Maybe mine should be in a different issue.

MostafaYA commented 1 year ago

same issue here FileNotFoundError: [Errno 2] No such file or directory: 'temp_checkm_dir/storage/tree/concatenated.tre' command line:
checkm lineage_wf -t 8 -x .fasta temp_checkm_dir --tab_table --file checkm_genome_results.tsv

versions: CheckM v1.2.2 Python 3.10.12

executed on: WSL installed on LAPTOP x86_64 (8 CPUs and 6.087029e+09 RAMs)

JiangweiPan1230 commented 1 year ago

I have installed checkm-genome on 3 ubuntu 22.04 systems for numerous times. All the procedures are following the INSTALLATION wiki. On the first two systems , when running the following comand: checkm test ./re the results showed as the following : FileNotFoundError: [Errno 2] No such file or directory: ./re/results/storage/tree/concatenated.tre'

However, the last try in the 3rd system had successed.

I am a new in bio-info, I think this issue may be caused by the source code because it had not been updated after 2015.

jq1242560952 commented 1 year ago

I have installed checkm-genome on 3 ubuntu 22.04 systems for numerous times. All the procedures are following the INSTALLATION wiki. On the first two systems , when running the following comand: checkm test ./re the results showed as the following : FileNotFoundError: [Errno 2] No such file or directory: ./re/results/storage/tree/concatenated.tre'

However, the last try in the 3rd system had successed.

I am a new in bio-info, I think this issue may be caused by the source code because it had not been updated after 2015.

你好,这个问题你解决了吗?

pinocc12 commented 5 months ago

I'm experiencing the same problem, do you have a solution for the trouble?

donovan-h-parks commented 2 months ago

Missing file issues are generally due to the machine running out of memory. You can find the CheckM system requirements here: https://github.com/Ecogenomics/CheckM/wiki/Installation#system-requirements