PacificBiosciences / pb-human-wgs-workflow-snakemake

Workflow for the comprehensive detection and prioritization of variants in human genomes with PacBio HiFi reads
BSD 3-Clause Clear License
38 stars 20 forks source link

Error running Glnexus #169

Closed priyambial123 closed 1 year ago

priyambial123 commented 1 year ago

Hello

I had no problem running the process_cohort.smk (last step in pipeline). Recently, I could not run the cohort_glnexus file. It stops with error:

Error in rule glnexus:
    jobid: 90
    output: cohorts/0009/glnexus/0009.GRCh38.deepvariant.glnexus.bcf, cohorts/0009/glnexus/0009.GRCh38.GLnexus.DB
    log: cohorts/0009/logs/glnexus/0009.GRCh38.log (check log file(s) for error message)
    shell:

        (rm -rf cohorts/0009/glnexus/0009.GRCh38.GLnexus.DB &&         glnexus_cli --threads 24 --mem-gbytes 192             --dir cohorts/0009/glnexus/0009.GRCh38.GLnexus.DB             --config DeepVariant_unfiltered samples/0009-004/deepvariant/0009-004.GRCh38.deepvariant.g.vcf.gz samples/0009-007/deepvariant/0009-007.GRCh38.deepvariant.g.vcf.gz samples/0009-010/deepvariant/0009-010.GRCh38.deepvariant.g.vcf.gz > cohorts/0009/glnexus/0009.GRCh38.deepvariant.glnexus.bcf) 2> cohorts/0009/logs/glnexus/0009.GRCh38.log

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job glnexus since they might be corrupted:
cohorts/0009/glnexus/0009.GRCh38.deepvariant.glnexus.bcf
Shutting down, this might take some time.

I looked into the log file : /usr/bin/bash: line 1: glnexus_cli: command not found

I downloaded the resources folder again and ran the process_cohort.smk again. But, I still get the same error. I also checked the rules files (cohort_glnexus.smk) and it was similar to the files in the GitHub. Is there any file should I check in the resources folder.

Thank you

juniper-lake commented 1 year ago

If the log file only contains the command not found error, then I don't think it's an issue with your resources folder. It is an issue with calling the glnexus_cli command. This job runs using a singularity/docker container that references a GLNEXUS_VERSION variable in the config file.

File rules/cohort_glnexus.smk: container: f"docker://ghcr.io/dnanexus-rnd/glnexus:{config['GLNEXUS_VERSION']}" File config.yaml: GLNEXUS_VERSION: 'v1.4.1'

If either of these lines are missing OR if the snakemake arguments allowing integration with singularity were accidentally disabled in the snakemake profiles (for example, these lines in the slurm profile), then the docker won't be loaded and the commands won't be available.

Please see the snakemake documentation for more information about running jobs in containers:

priyambial123 commented 1 year ago

I had to check the resources folder, as I got an error as missing input file in the pbsv annotation step (svpack):

Missing input files for rule svpack_filter_annotated:
resources/hprc/hprc.GRCh38.pbsv.v.2.6.0-20210417.vcf.gz

I fixed it by renaming the file in the resources folder hprc.GRCh38.pbsv.vcf.gz to hprc.GRCh38.pbsv.v.2.6.0-20210417.vcf.gz

Then I had this error in glnexus step, so was not sure if it was the recent download or something else leading to the error.

Thank you for pointing out the singularity integration.

I added a line to fix the issue:

--use-singularity in the below snakemake command and it is running now:

snakemake --rerun-incomplete --reason --config "cohort='$COHORT'" --nolock --local-cores 4 --use-singularity --jobs 500 --max-jobs-per-second 1 --use-conda --conda-frontend mamba --cluster-config workflow/profiles/slurm/config.yaml --snakefile workflow/process_cohort.smk

Thank you :-)