JetBrains-Research / snakecharm

Plugin for PyCharm / IntelliJ IDEA Platform IDEs which adds support for Snakemake language.
MIT License
61 stars 7 forks source link

Wildcard not defined in output of Target Rule #512

Open JoshLoecker opened 10 months ago

JoshLoecker commented 10 months ago

By default, Snakemake assumes the first rule in a Snakefile is a "target rule". This means it will not have an output section, but this plugin assumes all rules will have an output section, and an error is shown: Wildcard '[WILDCARD_NAME]' isn't properly defined.

To fix this, the following should be done

  1. The first rule in the Snakefile should ignore any requirement for wildcards in the output section
  2. If the rule has the option default_target: True, it should ignore any requirement for wildcards in the output section

Source: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#target-rules

Example Snakefile with the first rule as a "target rule"

rule all:
    input:
        expand("{dataset}/file.A.txt", dataset=DATASETS)

rule complex_conversion:
    input:
        "{dataset}/inputfile"
    output:
        "{dataset}/file.{group}.txt"
    shell:
        "somecommand --group {wildcards.group} < {input} > {output}"

Using the default_target option

rule complex_conversion:
    input:
        "{dataset}/inputfile"
    output:
        "{dataset}/file.{group}.txt"
    shell:
        "somecommand --group {wildcards.group} < {input} > {output}"

rule xy:
    input:
        expand("{dataset}/file.A.txt", dataset=DATASETS)
    default_target: True

EDIT: A quick (but not ideal) fix is to disable the "Undefined wildcard usage." inspection under "Settings -> Editor -> Inspections -> Snakemake -> Undefined wildcard usage.

iromeo commented 7 months ago

HI, thx for reporting

By default, Snakemake assumes the first rule in a Snakefile is a "target rule". This means it will not have an output section, but this plugin assumes all rules will have an output section, and an error is shown: Wildcard '[WILDCARD_NAME]' isn't properly defined.

The plugin allows rules w/o output. And for both provided examples no error is shown: image

and

image

Here dataset in rules xy, all isn't snakemake wildcard object, it is just a placeholder for expand method argument datasets=DATASETS. In rule complex_conversion the dataset and group are indeed wildcards.

and an error is shown: Wildcard '[WILDCARD_NAME]' isn't properly defined.

Such error is typically shown when you use wildcard e.g. in input section, but wildcard isn't defined for the rule. Wildcards are defined inoutput/log/benchmark sections. E.g. see example below: image

The first rule in the Snakefile should ignore any requirement for wildcards

As far as I understand snakemake not allows wildcards in target rule, e.g. rule

rule all:
    input:
        "{dataset}/inputfile"

rule foo:
    output: "{dataset}/inputfile"
    shell: 'touch {output}'

doesn't work, snakemake fails with error WildcardError in rule all in file:

$ snakemake --snakefile foo.smk -c 1

Building DAG of jobs...
WildcardError in rule all in file /Users/romeo/work/snakecharm/snakemaek_examples/playground_project/rule_512/foo.smk, line 1:
Wildcards in input files cannot be determined from output files:
'dataset'

additionally, it doesn't work with default_tartget:

rule foo:
    output: "{dataset}/inputfile"
    shell: 'touch {output}'

rule all:
    input:
        "{dataset}/inputfile"
    default_target: True

Snakemake show error WildcardError in rule all in file:

$ snakemake --snakefile foo.smk -c 1

Building DAG of jobs...
WildcardError in rule all in file /Users/romeo/work/snakecharm/snakemaek_examples/playground_project/rule_512/foo.smk, line 5:
Wildcards in input files cannot be determined from output files:
'dataset'

It could work w/o output section when input comes from command line, but it doesn't relate to default_target rule or first rule in the file:

rule foo:
    output: "{dataset}/inputfile"
    shell: 'touch {output}'

rule all:
    input:
        "{dataset}/inputfile"

Snakemake could be launched for desired input file matched to input of some rule, which doesn't have output section:

$ snakemake --snakefile foo.smk -c 1 foo/inputfile 

Building DAG of jobs...
Using shell: /opt/homebrew/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job      count    min threads    max threads
-----  -------  -------------  -------------
foo          1              1              1
total        1              1              1

Select jobs to execute...

[Mon Apr 29 13:45:01 2024]
rule foo:
    output: foo/inputfile
    jobid: 0
    reason: Missing output files: foo/inputfile
    wildcards: dataset=foo
    resources: tmpdir=/var/folders/gq/b3nl5jss1nb0v9zh3x3nbc2w0000gn/T
...
[Mon Apr 29 13:45:01 2024]
Finished job 0.
1 of 1 steps (100%) done

If default rule have output section, wildcards not allowed WorkflowError: Target rules may not contain wildcards:

rule all:
    input:
        "{dataset}/inputfile"

rule foo:
    output: "{dataset}/inputfile"
    default_target: True
    shell: 'touch {output}'
$snakemake --snakefile foo.smk -c 1
Building DAG of jobs...
WorkflowError:
Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line, or have a rule without wildcards at the very top of your workflow (e.g. the typical "rule all" which just collects all results you want to generate in the end).
iromeo commented 7 months ago

EDIT: A quick (but not ideal) fix is to disable the "Undefined wildcard usage." inspection under "Settings -> Editor -> Inspections -> Snakemake -> Undefined wildcard usage.

A better workaround is to suppress the warning only for the rule where you see it because in other cases it is expected to work correctly:

image

the result will be: image

iromeo commented 7 months ago

@JoshLoecker

Could you please describe your use case where you see the above-mentioned error + attach a screenshot of such error if possible

To fix this, the following should be done

  • The first rule in the Snakefile should ignore any requirement for wildcards in the output section
  • If the rule has the option default_target: True, it should ignore any requirement for wildcards in the output section

Please read my first comment and examples, seems:

  1. default target setting isn't related to wildcard problem, because snakemake also doesn't work with default target with wildcards
  2. snakemake works with any rule if proper input is provided via command line even without forcing to make the rule be default target (using first rule approach or default_target annotation)