chanzuckerberg / miniwdl

Workflow Description Language developer tools & local runner
MIT License
173 stars 54 forks source link

v1.1 spec regex parsing #661

Open adthrasher opened 10 months ago

adthrasher commented 10 months ago

We recently switched to v1.1 of the WDL spec for our WDL. We noticed that the following WDL definition in one of our tasks is considered valid by miniwdl check in a version 1.1 document.

        String prefix = sub(
            basename(fastq),
            "([_\.][rR][12])?(\.subsampled)?\.(fastq|fq)(\.gz)?$",
            ""
        )

We were expecting this to show a validation error as the 1.1 spec indicates that backslashes need to be double-escaped in these regular expressions (https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#string-substring-string-string).

Is miniwdl being lenient in its checking in this case? Or is our understanding of the spec incorrect?

Both \\ and \ seem to be accepted by miniwdl check.

claymcleod commented 10 months ago

Just to build on what Andrew said, it appears that WDL v1.0 restricts the escaped characters that can be included within a string: https://github.com/openwdl/wdl/blob/main/versions/1.0/SPEC.md#whitespace-strings-identifiers-constants. The above string has an escaped \., which is not allowed in a string, but miniwdl check passes that.

mlin commented 10 months ago

Just looked into this a little and found a relevant check commented out,

https://github.com/chanzuckerberg/miniwdl/blob/7344a900f408c2703e4dc4504c42a4f1d1844984/WDL/_parser.py#L63-L67

Sooo, I need to figure out if that was an oversight or what... 😅