cmkobel / CompareM2

🦠📇 Microbial genomes-to-report pipeline
https://CompareM2.readthedocs.io
GNU General Public License v3.0
52 stars 3 forks source link

installation issue #110

Closed flefler closed 2 months ago

flefler commented 2 months ago

Hello,

I am excited to use this tool as it seems to fit most of my needs. However, I am having difficulty with the installation.

I followed the installation guide.

I was able to run the comparem2 --until fast with no issues. Running comparem2 --until downloads resulted in the below error. I can provide more information if needed.

comparem2 --until downloads

Using profile /blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/share/comparem2-2.10.1-0/profile/apptainer/default for setting default command line arguments.
 ____                                                               ___      
/\  _`\                                               /\`- _/`\   /'___`\    
\ \ \/\_\   ___  _ ___ ___  __ __     __     _ __    _\ \      \ /\_\_ \ |   
 \ \ \/_/_ / __`\\' __` __`\\ '__`\ /'__`\  /\' __\/'__`\\ \__\ \\/_/ / / _  
  \ \ \/\ \\ \/\ \\ \/\ \/\ \\ \/\ \\ \/\.\_\ \ \//\  __/ \ \_/\ \  / /__\ \ 
   \ \____/ \____/ \_\ \_\ \_\\ , _/ \__/.\_\\ \_\\ \____\ \_\\ \_\/\_______\
    \/___/ \/___/ \/_/\/_/\/_/ \ \ \/__/\/_ / \/_/ \/____/\/_/ \/_/\________/
                              \ \_\                                          
                               \/_/                                           

                      Formerly known as Assemblycomparator2              
                       github.com/cmkobel/comparem2/issues               
                             comparem2.readthedocs.io                    
                                     v2.10.1

  Variables                                                              
  ---------                                                              
    title         : 'test_comparem2_install'
    base          : '/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/share/comparem2-2.10.1-0'
    databases     : '/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/share/comparem2-2.10.1-0/databases'

  Available rules                                                        
  ---------------                                                        
    Q.C.          : copy assembly_stats sequence_lengths checkm2 busco   
    Annotate      : prokka bakta                                         
    Adv. annotate : eggnog interproscan dbcan kegg_pathway abricate mlst 
    Core-pan      : panaroo                                              
    Phylogenetic  : mashtree fasttree gtdbtk iqtree treecluster snp_dists
    Pseudo        : meta isolate downloads fast report                   
  (Use ´--until <rule> [<rule2>..]´ to activate only one or more rules)  

  Sample overview
  ---------------
 1-index          input_file          sample
       1 116_2 duplicate.fna 116_2_duplicate
       2           116_2.fna           116_2
       3           E8202.fna           E8202
       4           SRR24.fna           SRR24
//

Warning: One or more file names contain space(s). These have been replaced with underscores " " -> "_"
passthrough parameters to busco: "--tar  --auto-lineage-prok "
passthrough parameters to prokka: "--compliant  --kingdom bacteria"
passthrough parameters to bakta: "--translation-table 11 --gram ?"
passthrough parameters to interproscan: "--applications TIGRFAM,Hamap,Pfam --goterms  --pathways "
passthrough parameters to eggnog: "--genepred prodigal --decorate_gff yes -m diamond"
passthrough parameters to gtdbtk: "--keep_intermediates "
passthrough parameters to mlst: ""
passthrough parameters to panaroo: "--clean-mode sensitive --core_threshold 0.95 --threshold 0.98 -a core -f 0.7"
passthrough parameters to mashtree: "--genomesize 5000000 --mindepth 5 --kmerlength 21 --sketch-size 10000"
passthrough parameters to treecluster: "--method max_clade --threshold 0.05"
passthrough parameters to fasttree: "-gtr "
passthrough parameters to iqtree: "--boot 100 -m GTR"
Building DAG of jobs...
WorkflowError:
Unable to find environment in container image. Maybe a conda environment was modified without containerizing again (see snakemake --containerize)?
Details:
Command 'SINGULARITYENV_CONDA_PKGS_DIRS=/tmp/conda/4d56e991-85d5-4ead-8ed0-4acfb652a4a8 singularity --quiet --silent exec --home '/blue/hlaughinghouse/flefler/test_comparem2_install' --bind "$COMPAREM2_BASE","$COMPAREM2_DATABASES","$(pwd)" /home/flefler/.comparem2/singularity-prefix/a42a981bd28108ad0db722109c353855.simg sh -c '[ -d '\''/conda-envs/349bbcba5bdfd7faab9097a8a3f7f6d2'\'' ]'' returned non-zero exit status 1.

Traceback (most recent call last):
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/weakref.py", line 666, in _exitfunc
    f()
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/weakref.py", line 590, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/tempfile.py", line 933, in _cleanup
    cls._rmtree(name, ignore_errors=ignore_errors)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/tempfile.py", line 929, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  [Previous line repeated 7 more times]
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 683, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 681, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'envs'
cmkobel commented 2 months ago

The

OSError: [Errno 39] Directory not empty: 'envs'

error is a known bug (https://github.com/snakemake/snakemake/issues/2728) in snakemake which has been recently fixed. When the new version of snakemake is integrated in comparem2, the issue will be solved. It is a mere race condition on cleaning up temporary files and it doesn't do any harm so it can be ignored for now.

For the

Unable to find environment in container image. Maybe a conda environment was modified without containerizing again (see snakemake --containerize)?

I'm a bit more puzzled. Could I ask you to run apptainer pull --force docker://cmkobel/comparem2:v2.10 first and then rerun comparem2, to see if that fixes the issue?

Best

cmkobel commented 2 months ago

I just found the issue. The issue is that rule "downloads" requires bakta's database, but because the annotator was set to prokka when I built the Docker (apptainer compatible) image, bakta's environment is not included.

cmkobel commented 2 months ago

The fix is to redownload the docker image which is now modified to include bakta. You might have to delete the old one on your system, probably stored in "~/.comparem2/singularity-prefix/"

If this doesn't fix the issue, try running apptainer cache clean as well.

Let me know if this fixes your issue.

And thanks for pointing out the error and let me know if you observe other anomalies.

flefler commented 1 month ago

Hi Carl,

I really appreciate the fast response and fix! I was able to run comparem2 --until downloads with only one (well maybe two) issue(s).

Mainly, the BUSCO database did not download :( miniforge3/envs/comparem2/share/comparem2-2.10.1-0/databases/busco only contains comparem2_busco_database_representative.flag which is an empty file. I would also like to run antismash, and there is no database folder for that either.

I then ran comparem2 --until antismash_download busco_download which resulted in the below output.

Using profile /blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/share/comparem2-2.10.1-0/profile/apptainer/default for setting default command line arguments.
 ____                                                               ___      
/\  _`\                                               /\`- _/`\   /'___`\    
\ \ \/\_\   ___  _ ___ ___  __ __     __     _ __    _\ \      \ /\_\_ \ |   
 \ \ \/_/_ / __`\\' __` __`\\ '__`\ /'__`\  /\' __\/'__`\\ \__\ \\/_/ / / _  
  \ \ \/\ \\ \/\ \\ \/\ \/\ \\ \/\ \\ \/\.\_\ \ \//\  __/ \ \_/\ \  / /__\ \ 
   \ \____/ \____/ \_\ \_\ \_\\ , _/ \__/.\_\\ \_\\ \____\ \_\\ \_\/\_______\
    \/___/ \/___/ \/_/\/_/\/_/ \ \ \/__/\/_ / \/_/ \/____/\/_/ \/_/\________/
                              \ \_\                                          
                               \/_/                                           

                      Formerly known as Assemblycomparator2              
                       github.com/cmkobel/comparem2/issues               
                             comparem2.readthedocs.io                    
                                     v2.10.1

  Variables                                                              
  ---------                                                              
    title         : 'test_comparem2_install'
    base          : '/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/share/comparem2-2.10.1-0'
    databases     : '/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/share/comparem2-2.10.1-0/databases'

  Available rules                                                        
  ---------------                                                        
    Q.C.          : copy assembly_stats sequence_lengths checkm2 busco   
    Annotate      : prokka bakta                                         
    Adv. annotate : eggnog interproscan dbcan kegg_pathway abricate mlst 
    Core-pan      : panaroo                                              
    Phylogenetic  : mashtree fasttree gtdbtk iqtree treecluster snp_dists
    Pseudo        : meta isolate downloads fast report                   
  (Use ´--until <rule> [<rule2>..]´ to activate only one or more rules)  

  Sample overview
  ---------------
 1-index          input_file          sample
       1 116_2 duplicate.fna 116_2_duplicate
       2           116_2.fna           116_2
       3           E8202.fna           E8202
       4           SRR24.fna           SRR24
//

Warning: One or more file names contain space(s). These have been replaced with underscores " " -> "_"
passthrough parameters to busco: "--tar  --auto-lineage-prok "
passthrough parameters to prokka: "--compliant  --kingdom bacteria"
passthrough parameters to bakta: "--translation-table 11 --gram ?"
passthrough parameters to interproscan: "--applications TIGRFAM,Hamap,Pfam --goterms  --pathways "
passthrough parameters to eggnog: "--genepred prodigal --decorate_gff yes -m diamond"
passthrough parameters to gtdbtk: "--keep_intermediates "
passthrough parameters to mlst: ""
passthrough parameters to panaroo: "--clean-mode sensitive --core_threshold 0.95 --threshold 0.98 -a core -f 0.7"
passthrough parameters to mashtree: "--genomesize 5000000 --mindepth 5 --kmerlength 21 --sketch-size 10000"
passthrough parameters to treecluster: "--method max_clade --threshold 0.05"
passthrough parameters to fasttree: "-gtr "
passthrough parameters to iqtree: "--boot 100 -m GTR"
Building DAG of jobs...
WorkflowError:
Unable to find environment in container image. Maybe a conda environment was modified without containerizing again (see snakemake --containerize)?
Details:
Command 'SINGULARITYENV_CONDA_PKGS_DIRS=/tmp/conda/ecb394b6-5e20-4027-b634-e76dbe831e99 singularity --quiet --silent exec --home '/blue/hlaughinghouse/flefler/test_comparem2_install' --bind "$COMPAREM2_BASE","$COMPAREM2_DATABASES","$(pwd)" /home/flefler/.comparem2/singularity-prefix/a42a981bd28108ad0db722109c353855.simg sh -c '[ -d '\''/conda-envs/2ae6d7c5468f4b0e569c09e067b92cfe'\'' ]'' returned non-zero exit status 1.

Traceback (most recent call last):
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/weakref.py", line 666, in _exitfunc
    f()
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/weakref.py", line 590, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/tempfile.py", line 933, in _cleanup
    cls._rmtree(name, ignore_errors=ignore_errors)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/tempfile.py", line 929, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  [Previous line repeated 7 more times]
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 683, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/blue/hlaughinghouse/flefler/miniforge3/envs/comparem2/lib/python3.11/shutil.py", line 681, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'envs'

Good news is that checkm2 eggnog gtdbtk assembly_stats mashtree bakta sequence_lengths all run fine!

cmkobel commented 1 month ago

I'll check the Busco issue. The flag file is supposed to be empty, but it sounds weird that the database is not downloaded. What does the log look like?

I've been working a lot to integrate antismash and while it works fine when running comparem2 via Conda, it fails when using the Docker image (Apptainer). This is because antismash tries to modify the read-only files in the docker image (https://github.com/antismash/antismash/issues/496) when a non-standard database location is used. So while it is technically possible to run antismash with comparem2 it is currently disabled until I've found a workaround.

cmkobel commented 1 month ago

I included antimash in the latest version v2.11.1 (just published). If you install that and disable apptainer, you can run antismash:

conda activate comparem2
export COMPAREM2_PROFILE="$CONDA_PREFIX/share/comparem2-2.11.1-0/profile/conda/default"
comparem2 --until antismash