Closed Pranav-Garg closed 3 months ago
Hi @Pranav-Garg ,
That's right, the Snakefile has been split into seq_type-specific workflows to better represent the real-world scenarios, and documentation has not been updated since then but is something we are planning to bring up to speed soon.
Thank you for reporting the error. Someone else also reported this issue and we are investigating. It appears that after certain version of snakemake the subworkflows are handled differently. Can you please:
Thanks
Snakemake version: 7.32.4
Yes, there was such a warning. Full output below (I removed the full file paths):
Warning: The oncopipe package was imported outside of a snakefile. Most functions are designed to work within a snakefile. Some unexpected behaviour/errors might occur.
modules/slms_3/1.0/slms_3.smk:411: SyntaxWarning: invalid escape sequence '\s'
str(rules._slms_3_annotate_strelka_gnomad.output.vcf),
modules/slms_3/1.0/slms_3.smk:438: SyntaxWarning: invalid escape sequence '\s'
rules._starfish_all.input,
modules/slms_3/1.0/slms_3.smk:534: SyntaxWarning: invalid escape sequence '\#'
modules/slms_3/1.0/slms_3.smk:553: SyntaxWarning: invalid escape sequence '\#'
modules/slms_3/1.0/../../starfish/2.0/starfish.smk:236: SyntaxWarning: invalid escape sequence '\S'
# Perform some clean-up tasks, including storing the module-specific
modules/pathseq/1.0/pathseq.smk:119: SyntaxWarning: invalid escape sequence '\>'
R={input.genome_fa}
modules/pathseq/1.0/pathseq.smk:133: SyntaxWarning: invalid escape sequence '\>'
log:
Building DAG of jobs...
Executing subworkflow reference_files.
workflows/reference_files/2.4/reference_files.smk:1460: SyntaxWarning: invalid escape sequence '\/'
workflows/reference_files/2.4/reference_files.smk:1480: SyntaxWarning: invalid escape sequence '\/'
Building DAG of jobs...
InputFunctionException in rule hardlink_download in file workflows/reference_files/2.4/reference_files_header.smk, line 584:
Error:
Exception: Could not find rule to generate genomes/ grch37 / repeatmasker/repeatmasker.grch37.bed .
Wildcards:
genome_build=grch37
suffix=repeatmasker/repeatmasker.grch37.bed
Traceback:
File "workflows/reference_files/2.4/reference_files_header.smk", line 475, in hardlink_same_provider
Have you managed to solve the issue? I have encountered the same problem trying to run the demo data.
Hi all, Sorry for the delay in resolving this issue. @focusonskills , the problem here is associated with the way new snakemake version started to handle the subworkflows, which makes the new versions incompatible with lcr-modules. A solution is to use a locked conda environment from the following recipe: https://github.com/LCR-BCCRC/lcr-modules/blob/master/demo/env.yaml This is a copy of our production environment and it has been tested to resolve this issue on several systems and OS versions.
Please let us know if you will have any other questions.
Hi all, Sorry for the delay in resolving this issue. @focusonskills , the problem here is associated with the way new snakemake version started to handle the subworkflows, which makes the new versions incompatible with lcr-modules. A solution is to use a locked conda environment from the following recipe: https://github.com/LCR-BCCRC/lcr-modules/blob/master/demo/env.yaml This is a copy of our production environment and it has been tested to resolve this issue on several systems and OS versions.
Please let us know if you will have any other questions.
I've tried to create a locked enviroment with conda-lock using env.yaml under the demo folder. However I am still getting the same output as OP where it stuck at generating reference. Which snakemake version is actually required for the modules? I see these in the env.yaml :
When we generate the environment, snakemake --version
returns 7.15.2
. This version should work.
What is the output of snakemake --version
and pip show oncopipe
when you activate your environment?
Can you post the output of conda env export
from the activated environment?
When we generate the environment,
snakemake --version
returns7.15.2
. This version should work. What is the output ofsnakemake --version
andpip show oncopipe
when you activate your environment? Can you post the output ofconda env export
from the activated environment?
snakemake --version
gives 7.32.4
pip show oncopipe
gives version 1.0.12
conda env export
gives the following.
Do you know why the snakemake version doesn't match with the one in conda enviroment?
name: opv12 channels:
This sounds like a problem with how your PATH environment variable is set. If you 'echo $PATH' with the conda environment activated, the path to the opv12 conda environment should be at the beginning of your path. If it's not, take a look at how you've modified your PATH in your .bashrc file.
echo $PATH
returns the opv12 enviroment at the beginning.
/home/bioinf/miniconda3/envs/opv12/bin:
snakemake --version
returns 7.15.2 which now match with the enviroment.
However running nice snakemake --dry-run --use-conda all -s capture_Snakefile.smk
still gives error below.
Building DAG of jobs... Executing subworkflow reference_files. Creating specified working directory /mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-modules/demo/reference. Building DAG of jobs... InputFunctionExceptionin line 583 of /mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-modules/workflows/reference_files/2.4/reference_files_header.smk: Error: AssertionError: The
download_oncodrive_hg19_regions
download rule doesn't have aprovider
param. Wildcards: genome_build=grch37 suffix=gnomad/af-only-gnomad.grch37.vcf Traceback: File "/mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-modules/workflows/reference_files/2.4/reference_files_header.smk", line 453, in hardlink_same_provider File "/mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-modules/workflows/reference_files/2.4/reference_files_header.smk", line 423, in get_matching_download_rules
Thanks for your patience @focusonskills. This was a known problem addressed in #310 and should be fixed if you pull from master again.
@lkhilton
I've updated workflows/reference_files/2.4/reference_files.smk
with the following.
rule download_oncodrive_refs:
output:
refs = "downloads/oncodrive/datasets/genomereference/{oncodrive_build}.master",
stops = "downloads/oncodrive/datasets/genestops/{oncodrive_build}.master"
params:
outdir = "downloads/oncodrive/{version}/",
provider = lambda w: config["genome_builds"][w.version]["provider"]
I've also checked that modules/oncodriveclustl/1.0/oncodriveclustl.smk
match with the suggested correction below.
rule _oncodriveclustl_run:
input:
maf = str(rules._oncodriveclustl_format_input.output.maf),
reference = lambda w: reference_files("downloads/oncodrive/{genome_build}/datasets/genomereference/" + ONCODRIVE_BUILD_DICT[w.genome_build] + ".master"),
region = _get_region
output:
txt = CFG["dirs"]["oncodriveclustl"] + "{genome_build}/{sample_set}--{launch_date}/{md5sum}/{region}/elements_results.txt",
tsv = CFG["dirs"]["oncodriveclustl"] + "{genome_build}/{sample_set}--{launch_date}/{md5sum}/{region}/clusters_results.tsv",
png = CFG["dirs"]["oncodriveclustl"] + "{genome_build}/{sample_set}--{launch_date}/{md5sum}/{region}/quantile_quantile_plot.png"
log:
stdout = CFG["logs"]["oncodriveclustl"] + "{genome_build}/{sample_set}--{launch_date}/{md5sum}/{region}/oncodriveclustl.stdout.log",
stderr = CFG["logs"]["oncodriveclustl"] + "{genome_build}/{sample_set}--{launch_date}/{md5sum}/{region}/oncodriveclustl.stderr.log"
params:
local_path = CFG["reference_files_directory"] + "{genome_build}/",
build = lambda w: (w.genome_build).replace("grch37","hg19").replace("grch38","hg38"),
command_line_options = CFG["options"]["clustl_options"] if CFG["options"]["clustl_options"] is not None else ""
However nice snakemake --dry-run --use-conda all -s capture_Snakefile.smk
still returns AssertionError below.
Building DAG of jobs... Executing subworkflow reference_files. Creating specified working directory /mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-modules/demo/reference. Building DAG of jobs... InputFunctionExceptionin line 583 of /mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-modules/workflows/reference_files/2.4/reference_files_header.smk: Error: AssertionError: The
download_oncodrive_hg19_regions
download rule doesn't have aprovider
param. Wildcards: genome_build=grch37 suffix=main_chromosomes/main_chromosomes.grch37.txt Traceback: File "/mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-modules/workflows/reference_files/2.4/reference_files_header.smk", line 453, in hardlink_same_provider File "/mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-modules/workflows/reference_files/2.4/reference_files_header.smk", line 423, in get_matching_download_rules
Could you please pull from master one more time? The commit I made to fix the assertion error was overwritten in another branch. You should see these changes after pulling the latest changes.
Thank you! The reference files have been successfully generated with the update and there are no more assertion error. However I'm encountering some other errors downstream with Battenberg/ASCAT installation. Could you shed some light on the issue?
Activating conda environment: .snakemake/conda/1ea4ad9da9e4539afd010c34139325fa_ Downloading GitHub repo Crick-CancerGenomics/ascat@master Skipping 3 packages not available: GenomicRanges, IRanges, S4Vectors Installing 8 packages: data.table, doParallel, foreach, GenomicRanges, IRanges, RColorBrewer, S4Vectors, iterators Error: Failed to install 'ASCAT' from GitHub: (converted from warning) packages ‘GenomicRanges’, ‘IRanges’, ‘S4Vectors’ are not available (for R version 3.6.3) Execution halted [Tue Jun 4 14:45:07 2024] Error in rule _install_battenberg: jobid: 137 output: results/battenberg-1.2/00-inputs/battenberg_dependenciesinstalled.success log: results/battenberg-1.2/logs/launched-2024-06-04-at-14-44-48/00-inputs/input.log (check log file(s) for error message) conda-env: /mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-modules/demo/.snakemake/conda/a85e0be326fb70ac2ccc6d95cb4ecce5 shell: R -q --vanilla -e 'devtools::install_github("Crick-CancerGenomics/ascat/ASCAT")' >> results/battenberg-1.2/logs/launched-2024-06-04-at-14-44-48/00-inputs/input.log && ##move some of this to config? R -q --vanilla -e 'devtools::install_github("morinlab/battenberg")' >> results/battenberg-1.2/logs/launched-2024-06-04-at-14-44-48/00-inputs/input.log && ##move some of this to config? touch results/battenberg-1.2/00-inputs/battenberg_dependencies_installed.success (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Hi @focusonskills ,
I trust this issue is related to the devtools
run of remotes
under the hood, which recently changed how it finds non-CRAN packages. It basically now refuses to install anything from BioConductor complaining that the package was not found. I think adding the argument repos = BiocManager::repositories()
to the devtools call should fix the problem, so the line R -q --vanilla -e 'devtools::install_github("Crick-CancerGenomics/ascat/ASCAT")'
should become R -q --vanilla -e 'devtools::install_github("Crick-CancerGenomics/ascat/ASCAT", repos = BiocManager::repositories())'
. Can you please see if this fixes the error?
Thank you!
@Kdreval It is able to locate the packages now but it still failed to install some of the packages.
- installing source package ‘data.table’ ... package ‘data.table’ successfully unpacked and MD5 sums checked using staged installation ** libs fread.c: In function 'freadMain': fread.c:1301:7: warning: ignoring return value of 'strtod', declared with attribute warn_unusedresult [-Wunused-result] (void)strtod(ch, &end); // careful not to let "" get to here as strtod considers "" numeric ^
~~~~~ /mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-pipeline/lcr-modules/demo/.snakemake/conda/2239fcac3430199448647436028de9d0/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-condacos6-linux-gnu/bin/ld: cannot find -lgomp collect2: error: ld returned 1 exit status make: *** [/mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-pipeline/lcr-modules/demo/.snakemake/conda/2239fcac3430199448647436028de9d0/lib/R/share/make/shlib.mk:6: data.table.so] Error 1 ERROR: compilation failed for package ‘data.table’- removing ‘/mnt/raid/Analysis/Ongoing/Haloplex/OldPipeline/lcr-pipeline/lcr-modules/demo/.snakemake/conda/2239fcac3430199448647436028de9d0_/lib/R/library/data.table’ Error: Failed to install 'ASCAT' from GitHub: (converted from warning) installation of package ‘data.table’ had non-zero exit status Execution halted
As of #327 we've updated the PyPi repository for Oncopipe and modified the demo/env.yaml
file. You should now be able to install the correct version of Snakemake and all dependencies (including Oncopipe) with the command outlined in the README.
There is also a pending PR #326 that includes updates to the battenberg conda environment that should resolve the battenberg installation issues.
This should resolve these issues, please let us know if there are any further stumbling blocks.
The command
nice snakemake --dry-run --use-conda all
fails because there is noSnakefile
in the demo directory. I then triedsnakemake --dry-run --use-conda all -s genome_Snakefile.smk
, which also fails with:(the exact rule that fails above is random)
Editing the file
lcr-modules/workflows/reference_files/2.4/reference_files_header.smk
, under functionget_matching_download_rules
, I changedto
and added
to the same function since whitespaces were somehow being prepended to the paths. But this also fails. I think perhaps the developers would be better equipped to debug this issue than me.
Related to this, it would be helpful to see what directory structure and files the Reference Files Workflow generates, so that I can symlink existing downloads to it, and perhaps bypass the above issue.