broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
987 stars 357 forks source link

basename() fails on required variable #6910

Open aofarrel opened 1 year ago

aofarrel commented 1 year ago

Made a new ticket for this since it's a new issue. basename() does not work on optional values, but sometimes it seems to think things that aren't optional are optional.

This passes miniwdl and Cromwell, ie, is expected behavior:

version 1.0

task T {
    input {
        File? tsv_file_input
        String tsv_arg = if defined(tsv_file_input) then basename(select_first([tsv_file_input, "/path/to/file.txt"])) else ""
    }

    command <<<
        echo ~{tsv_arg}
    >>>

}

workflow W {
    input {
        File? tsv_file_input
    }

    call T {
        input:
            tsv_file_input = tsv_file_input
    }
}

This passes miniwdl, but fails Cromwell, even though it really ought to pass both:

version 1.0

task T {
    input {
        File? tsv_file_input
        String foo = select_first([tsv_file_input, "/path/to/file.txt"])
        String tsv_arg = if defined(tsv_file_input) then basename(foo) else ""
    }

    command <<<
        echo ~{tsv_arg}
    >>>

}

workflow W {
    input {
        File? tsv_file_input
    }

    call T {
        input:
            tsv_file_input = tsv_file_input
    }
}

Cromwell's error is:

14:27:13.383 [main] ERROR io.dockstore.client.cli.ArgumentUtility - Problem parsing WDL file: Failed to process task definition 'T' (reason 1 of 1): Failed to process expression 'select_first([tsv_arg, if defined(tsv_file_input) then basename(foo) else ""])' (reason 1 of 1): Invalid parameter 'IdentifierLookup(foo)'. Expected 'File' but got 'String?' 14:27:13.385 [main] ERROR io.dockstore.client.cli.ArgumentUtility - wdl.draft3.parser.WdlParser$SyntaxError: Failed to process task definition 'T' (reason 1 of 1): Failed to process expression 'select_first([tsv_arg, if defined(tsv_file_input) then basename(foo) else ""])' (reason 1 of 1): Invalid parameter 'IdentifierLookup(foo)'. Expected 'File' but got 'String?'

Originally posted by @aofarrel in https://github.com/broadinstitute/cromwell/issues/6840#issuecomment-1245982086

pshapiro4broad commented 1 year ago

Yes, this looks like a bug in cromwell. I seem to recall there was an issue (feature?) where cromwell would implicitly make default arguments optional. There was some discussion of this before but I can't remember where it was.

Moving the String foo line outside of the input block would let you work around this bug. (It's ok to declare foo after tsv_arg because the WDL flow graph will ensure that foo is defined before tsv_arg's initializer is evaluated, even if it's declared after it.)

aofarrel commented 1 year ago

That workaround has got me curious... My previous understanding is that moving things out of the input block only has the effect of making them act like private variables. The spec also says that this region of "non-input declarations" are supposed to act as intermediate values, so I'm surprised tsv_arg can remain in the input section (since by merit of it being there, it can be overwritten by the user). How is this section implemented in Cromwell? Is it effectively just an extension of the input section, but the user can't directly set their values?