KThorellGroup / BACTpipe

BACTpipe: An assembly and annotation pipeline for bacterial genomics
https://bactpipe.readthedocs.org
MIT License
20 stars 8 forks source link

bioconda::prokka environment error #172

Closed shigdon closed 5 days ago

shigdon commented 2 years ago

Hello,

I was attempting BACTpipe runs on ctmr-gandalf with a test set of 2 isolate pe-fastq samples when unexpectedly the system encountered an error during the Prokka module. The Git Repo was pulled fresh yesterday, Feb 7, 2022. I initially submitted the jobs via the following sbatch script:

#!/bin/bash -login
#SBATCH -D /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe
#SBATCH -p ctmr
#SBATCH -J Bpipe_0
#SBATCH -t 168:00:00
#SBATCH -N 1
#SBATCH -c 1
#SBATCH --output /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/slurm-log/bactpipe-%j.out
#SBATCH --error /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/slurm-log/bactpipe-%j.err
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=shawn.higdon@ki.se

# activate a snakemake wgs conda environment
conda activate bactpipe

# make things fail on errors
set -o nounset
set -o errexit
set -x

nextflow run ctmrbio/BACTpipe \
    -profile ctmr_gandalf \
    --kraken2_db /ceph/db/kraken2/gtdb_r89_54k \
    --kraken2_confidence 0.5 \
    --keep_shovill_output TRUE \
    --shovill_depth 100 \
    --shovill_minlen 500 \
    --reads '/ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/test_fq/*_{1,2}.fq.gz' \
    -resume

This produced the following error during prokka module:

[22:33:53] There are still 1247 unannotated CDS left (started with 4883)
  [22:33:53] Will use hmmer3 to search against /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/db/hmm/HAMAP.hmm with 4 CPUs
  [22:33:53] Running: cat 117\-89\-c2_prokka\/117\-89\-c2\.HAMAP\.hmm\.tmp\.2161016\.faa | parallel --gnu --plain -j 4 --block 41360 --recstart '>' --pipe hmmscan --noali --notextw --acc -E 1e-09 --cpu 1 /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/db/hmm/HAMAP.hmm /dev/stdin > 117\-89\-c2_prokka\/117\-89\-c2\.HAMAP\.hmm\.tmp\.2161016\.hmmer3 2> /dev/null
  Bio::SearchIO: hmmer3 cannot be found
  Exception
  ------------- EXCEPTION -------------
  MSG: Failed to load module Bio::SearchIO::hmmer3. Can't locate Bio/SearchIO/hmmer3.pm in @INC (you may need to install the Bio::SearchIO::hmmer3 module) (@INC contains: /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/5.32/site_perl /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/5.32/vendor_perl /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/vendor_perl /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/5.32/core_perl /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/core_perl .) at /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl/Bio/Root/Root.pm line 520.

  STACK Bio::Root::Root::_load_module /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl/Bio/Root/Root.pm:522
  STACK (eval) /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl/Bio/SearchIO.pm:620
  STACK Bio::SearchIO::_load_format_module /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl/Bio/SearchIO.pm:619
  STACK Bio::SearchIO::new /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/lib/perl5/site_perl/Bio/SearchIO.pm:217
  STACK toplevel /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/bin/prokka:1113
  -------------------------------------

  For more information about the SearchIO system please see the SearchIO docs.
  This includes ways of checking for formats at compile time, not run time
  Can't call method "next_result" on an undefined value at /ceph/projects/201_MiUdda/analysis/smh_V300099797_BACTpipe/work/conda/env-478453b56157f1ebe398640a7fd2ec99/bin/prokka line 1114.

I then reran after modifying the nextflow command by substituting the -profile flag for:

-c ctmr_gandalf-custom.config

Where the only change I made to the config was to specify a specific version of prokka to Conda:

// vim: syntax=groovy expandtab
// BACTpipe Nextflow configuration file for use on CTMR Gandalf

params {
    project = 'bio'
    partition = 'ctmr'
}

process {
    errorStrategy = 'terminate'
    executor = 'slurm'
    clusterOptions = {
        " --partition ${params.partition} -A ${params.project}" + (params.clusterOptions ?: '')
    }
    scratch = false
    stageInMode = 'copy'
    stageOutMode = 'copy'

    withName:
    FASTP {
        cpus = 4
        time = 20.m
        conda = 'bioconda::fastp'
    }

    withName:
    SHOVILL {
        cpus = 10
        time = 2.h
        conda = 'bioconda::shovill bioconda::bwa=0.7.16 python=3'
    }

    withName:
    CLASSIFY_TAXONOMY {
        cpus = 10
        time = 30.m
        conda = 'bioconda::kraken2'
    }

    withName:
    ASSEMBLY_STATS {
        cpus = 1
        time = 20.m
        conda = 'bioconda::bbmap'
    }

    withName:
    PROKKA {
        cpus = 8
        time = 2.h
        conda = 'bioconda::prokka=1.14.6'
    }

    withName:
    MULTIQC {
        cpus = 1
        time = 10.m
        conda = 'bioconda::multiqc'
    }
}

Rerunning with this change in configuration profile did not solve the issue and produced the same error.

I am not sure how to proceed in fixing this but my next logical step would be to downgrade the prokka version. Any thoughts or suggestions are much appreciated.

Thanks!

boulund commented 2 years ago

Sorry to hear you're having issues @shigdon!

There should be no need to use an sbatch script to run the nextflow pipeline; nextflow will automatically do all the required job submissions if you run with the appropriate profile (in this case ctmr_gandalf). You can just run nextflow in a tmux session on the login node, that's perfectly ok!

I agree that it sounds as if the prokka environment isn't working as intended. Perhaps a version change could work, did you have time to try that yet?

Another thing I've been thinking of is to add container directives to the config of all modules so we can use already available biocontainers for all these packages, that should make the pipeline more robust overall and easier to execute in different compute environments.

thorellk commented 5 days ago

As the Gandalf HPC is discontinued and we are moving towards replacing the conda envs with containers, I close this issue.