NCI-RBL / iCLIP

RNA Biology Pipeline to Characterize protein-RNA Interactions
https://rbl-nci.github.io/iCLIP/
MIT License
4 stars 2 forks source link

issue with config parsing spaces #137

Closed slsevilla closed 1 year ago

slsevilla commented 1 year ago

Current pipeline (v2.2) did not have any defaults for the modules loaded within the pipeline. As a result when snakemake updated to v 7.19.1, our pipeline used this new version.

Parsing of the config file has changed and an error was seen: "------------------------------------------------------------------------

STARTING DryRun

[+] Loading snakemake 7.19.1

run_snakemake.sh: line 559: 1: command not found

run_snakemake.sh: line 63: 1: command not found

ERROR: Output dir provided: /data/RBL_NCI/Wolin/mov10_par_Y_r2_01052023 does not match snakemake_config: . Update and re-run."

This was caused by two lines in the snakemake_config.yaml file

outSJfilterCountTotalMin: "3 1 1 1" #minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif
outSJfilterOverhangMin: "30 12 12 12" #minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif

Solution: 1) Fix the snakemake_config.yaml file to include commas rather than spaces

outSJfilterCountTotalMin: "3,1,1,1" #minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif
outSJfilterOverhangMin: "30,12,12,12" #minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif
  1. Update the Snakefile to convert these to spaces
    star_filt_sjmin = config['outSJfilterCountTotalMin'].replace(",", " ")
    star_filt_overhang = config['outSJfilterOverhangMin'].replace(",", " ")
  2. Create a function to control the module loading of each of the packages used within the pipeline
    load_modules(){
    if [[ $1 =~ "python" ]]; then module load python/3.8; fi
    if [[ $1 =~ "snakemake" ]]; then module load snakemake/7.19.1; fi
    if [[ $1 =~ "graphviz" ]]; then module load graphviz/2.40; fi
    }

Will tag a minor pipeline version and update the /data/RBL_NCI/Pipelines/iCLIP directory location

slsevilla commented 1 year ago

fixed with commit/merge request https://github.com/RBL-NCI/iCLIP/pull/138