core-unit-bioinformatics / reference-container

Build repository for reference container
MIT License
0 stars 0 forks source link

Problem running rule with 2 'derive' #6

Open svenwillger opened 2 years ago

svenwillger commented 2 years ago

When trying to create the T2T-CHM13 container an error message is thrown when the .dict file should be created. Manual creation of the .dict file and also subsequent creation of the 2 'derive' files by hashing out the other code each works.

In the ncbi-t2t-chm13hg002_v1.yaml are 2 files that should be created by 'derive' https://github.com/core-unit-bioinformatics/reference-container/blob/db7507354de3d439f77c11b1b355436efa5d8e0b/config/ref_container/ncbi-t2t-chm13hg002_v1.yaml#L28-L43

The error indicates that something in the rule rule extract_ftp_file in the transforming.smk module in lines 98-119 is not working properly https://github.com/core-unit-bioinformatics/reference-container/blob/db7507354de3d439f77c11b1b355436efa5d8e0b/workflow/rules/transforming.smk#L98-L119

ptrebert commented 2 years ago

although I have not identified the source of the problem, this seems to be a Snakemake issue in that it re-uses the command line in the Singularity call, which would suggest maybe a faulty caching mechanism or simply an internal variable that is not updated. I think I'll open an issue for Snakemake because diagnosing that w/o deeper insight into how Snakemake constructs a rule seems impossible.

ptrebert commented 2 years ago

Sven, please try to produce a minimally reproducible example for this problem s.t. we can create a hopefully easy to solve issue in the snakemake repo

svenwillger commented 1 year ago

A small test script and the required test file have been uploaded to a new branch issue_2_derive. This issue could be recreated with this small script. You can't run both rules but if you do them subsequently they work.

Here is also the script:

rule all:
    input: 
        "test/genome.dict",
        "test/genome.fa.fai",

Test_data = [
{
    "name": "genome.fa",
    "input": "test/genome.fa",
    "output": "test/genome.dict",
    "shell": "samtools dict {input} > {output}",
    "singularity": "https://depot.galaxyproject.org/singularity/samtools:1.6--hb116620_7",
    "rule_name": "rule_test1"
},
{
    "name": "genome.fa",
    "input": "test/genome.fa",
    "output": "test/genome.fa.fai",
    "shell": "samtools faidx {input}",
    "singularity": "https://depot.galaxyproject.org/singularity/samtools:1.6--hb116620_7",
    "rule_name": "rule_test2",
    } 
]

for DT in Test_data:
    rule derive_test:
        name:
            f"derive_{DT['rule_name']}"
        message:
            f"Deriving data file {DT['name']}"
        input:
            DT["input"],
        output:
            DT["output"],
        container:
            DT["singularity"]
        shell:
            DT["shell"]

I'll not use "Close with comment" because I want to wait until we get a response from the snakemake developers. The commit comment is "test script and file to demonstrate issue".