CCBR / RENEE

A comprehensive quality-control and quantification RNA-seq pipeline
https://CCBR.github.io/RENEE/
MIT License
4 stars 4 forks source link

SIF images not being copied from shared cache dir #113

Closed kelly-sovacool closed 9 months ago

kelly-sovacool commented 9 months ago

When Jing ran renee with --sif-cache /data/CCBR_Pipeliner/SIFS, the snakemake log shows that the containers are pulled from the internet. They're supposed to be copied from the shared sif cache dir into the local singularity cache (usually ${OUTDIR}/.singularity)

Most images in the shared sif cache dir on biowulf are named nciccbr-ccbr_xxx i.e. the format is namespace-image. However, the ones from renee are just image. Maybe we need to update the files in the cache to reflect the other naming scheme? Did this change in an update to singularity?

However, this issue doesn't happen when I execute renee run --sif-cache /data/CCBR_Pipeliner/SIFS...

kelly-sovacool commented 9 months ago

My understanding was that running renee with --sif-cache should copy the sifs from the shared sif location (/data/CCBR_Pipeliner/SIFS) to the local cache dir (${OUTDIR}/.singularity), but it's not obvious how this actually gets accomplished during the run() function.

trace: run() -> setup() -> image_cache() https://github.com/CCBR/RENEE/blob/3975071619fd917fba449427ed31c0246ea6f4e9/renee#L758-L760

This does not copy the sifs, it just prints messages if they aren't found.

I think we may need to rerun renee cache to generate the images and copy them to /data/CCBR_Pipeliner/SIFS.

Update: setup() does write the updated config file in the end.

kelly-sovacool commented 9 months ago

When run with --sif-cache, config.json contents have absolute paths to the sif files:

    "images": {
        "arriba": "/data/CCBR_Pipeliner/SIFS/ccbr_arriba_2.0.0_v0.0.1.sif",
        "bam2strandedbw": "/data/CCBR_Pipeliner/SIFS/ccbr_bam2strandedbw_v0.0.1.sif",
        "bbtools": "/data/CCBR_Pipeliner/SIFS/ccbr_bbtools_38.87_v0.0.1.sif",
        "build_rnaseq": "/data/CCBR_Pipeliner/SIFS/ccbr_build_rnaseq_v0.0.1.sif",
        "cutadapt": "/data/CCBR_Pipeliner/SIFS/ccbr_cutadapt_1.18_v032219.sif",
        "fastq_screen": "/data/CCBR_Pipeliner/SIFS/ccbr_fastq_screen_0.13.0_v2.0.sif",
        "fastqc": "/data/CCBR_Pipeliner/SIFS/ccbr_fastqc_0.11.9_v1.1.sif",
        "fastqvalidator": "/data/CCBR_Pipeliner/SIFS/ccbr_fastqvalidator_v0.1.0.sif",
        "kraken": "/data/CCBR_Pipeliner/SIFS/ccbr_kraken_v2.1.1_v0.0.1.sif",
        "miniconda": "/data/CCBR_Pipeliner/SIFS/miniconda3_4.9.2.sif",
        "multiqc": "/data/CCBR_Pipeliner/SIFS/multiqc_v0.1.0.sif",
        "picard": "/data/CCBR_Pipeliner/SIFS/ccbr_picard_v0.0.1.sif",
        "preseq": "/data/CCBR_Pipeliner/SIFS/ccbr_preseq_v0.0.1.sif",
        "python": "/data/CCBR_Pipeliner/SIFS/ccbr_python_v0.0.1.sif",
        "qualimap": "/data/CCBR_Pipeliner/SIFS/ccbr_qualimap_v0.0.1.sif",
        "rna": "/data/CCBR_Pipeliner/SIFS/ccbr_rna_v0.0.1.sif",
        "rsem": "/data/CCBR_Pipeliner/SIFS/ccbr_rsem_1.3.3_v1.0.sif",
        "rseqc": "/data/CCBR_Pipeliner/SIFS/ccbr_rseqc_4.0.0_v1.0.sif",
        "rstat": "/data/CCBR_Pipeliner/SIFS/ccbr_rstat_v0.0.1.sif"
    },

without it, docker URLs specified in config/containers/images.json are used and they're pulled from the internet:

    "images": {
        "arriba": "docker://nciccbr/ccbr_arriba_2.0.0:v0.0.1",
        "bam2strandedbw": "docker://nciccbr/ccbr_bam2strandedbw:v0.0.1",
        "bbtools": "docker://nciccbr/ccbr_bbtools_38.87:v0.0.1",
        "build_rnaseq": "docker://nciccbr/ccbr_build_rnaseq:v0.0.1",
        "cutadapt": "docker://nciccbr/ccbr_cutadapt_1.18:v032219",
        "fastq_screen": "docker://nciccbr/ccbr_fastq_screen_0.13.0:v2.0",
        "fastqc": "docker://nciccbr/ccbr_fastqc_0.11.9:v1.1",
        "fastqvalidator": "docker://nciccbr/ccbr_fastqvalidator:v0.1.0",
        "kraken": "docker://nciccbr/ccbr_kraken_v2.1.1:v0.0.1",
        "miniconda": "docker://continuumio/miniconda3:4.9.2",
        "multiqc": "docker://skchronicles/multiqc:v0.1.0",
        "picard": "docker://nciccbr/ccbr_picard:v0.0.1",
        "preseq": "docker://nciccbr/ccbr_preseq:v0.0.1",
        "python": "docker://nciccbr/ccbr_python:v0.0.1",
        "qualimap": "docker://nciccbr/ccbr_qualimap:v0.0.1",
        "rna": "docker://nciccbr/ccbr_rna:v0.0.1",
        "rsem": "docker://nciccbr/ccbr_rsem_1.3.3:v1.0",
        "rseqc": "docker://nciccbr/ccbr_rseqc_4.0.0:v1.0",
        "rstat": "docker://nciccbr/ccbr_rstat:v0.0.1"
    },
kelly-sovacool commented 9 months ago

This seems to have disappeared. I think the problem may have been that the output directory was first initialized without the sif cache argument, and then renee was run with the sif cache flag -- the run likely used the old config file since it already existed instead of adding the sif file paths.