broadinstitute / wdltool

BSD 3-Clause "New" or "Revised" License
18 stars 4 forks source link

WDL syntax error in comment still reported as an error #7

Closed tmdefreitas closed 6 years ago

tmdefreitas commented 8 years ago

I came across this while helping a colleague debug her WDL file. When this WDL file is validated, wdltool claims an "Error: finished parsing without consuming all tokens", even though the error is commented out:

task comment_bug {
    #String an_input

    command {
        echo "Alternate command"
        # The following line has a WDL syntax error, but in a comment!
        #echo {an_input} 
    }

    output {

    }
}

workflow test {
    call comment_bug
}

EDIT: error message, for completeness:

$ java -jar wdltool.jar validate comment_bug.wdl
ERROR: Finished parsing without consuming all tokens.

    output {
 ^

I can get rid of the error by changing the comment line to #echo ${an_input}. I think errors in comment lines should probably be ignored by the validator.

As an aside, is there a more helpful error message for this? The message sounds like an unused input variable or something, not that the bracket syntax was off, so it was hard to diagnose (The above is a toy example, the real task had a much more complicated command).

scottfrazer commented 8 years ago

This is happening because the the command {...} section is parsed differently than the rest of WDL. The parser thinks that the closing brace in #echo {an_input} is actually trying to close the command section. If you use the alternative delimiters (command <<< ... >>>) this is another way to get around it.

We parse the command as "opaque strings intermixed with ${...} blocks". That means that the #-style comments inside a command section are not interpreted as WDL comments, but instead as part of the command. More thought would have to go into figuring out what the right thing to do here is.

tmdefreitas commented 8 years ago

Admittedly, I have never written a parser before, so I don't know how feasible this is, but can you write the grammar/parser such that everything on a line after a # character is ignored?

The only edge cases I imagine are when the # character is in a quoted string.

scottfrazer commented 8 years ago

@tmdefreitas Yes, that is definitely possible.

However, we try to not make assumptions about the type of characters that your script can have in it. I'm perhaps being a little overly cautious, but I'd hate for there to be a case where somebody wants to use a # in their command but it gets interpreted as a comment. That could lead to the same kind of confusion that we're seeing now.

I vacillate on this because I also see the pragmatism in implementing your suggestion for the common case. In most cases I can think of, a # is a comment

Maybe some approach like Eddie's where I can have the parser give a better error message is the best solution.

eddiebroad commented 8 years ago

Scott,

Not only can I imagine a command where # is used (and in a WDL no less), but I have actually written such a command.

In the merge step for mutect, I use "#" to select for header lines in merging call_stats files and VCF files!

command <<<

increase verbosity

set -x

mutect1 call_stats merging

MUTECT1_CS="MuTect1.call_stats.txt" head --lines=2 ${mutect1_cs[0]} > $MUTECT1_CS cat ${sep =' ' mutect1_cs} | grep -Pv '#'|grep -Pv '^contig' >> $MUTECT1_CS

mutect2 call_stats merging

MUTECT2_CS="MuTect2.call_stats.txt" cat ${mutect2_cs[0]} |grep -P '^#' > $MUTECT2_CS ; cat ${sep=' ' mutect2_cs} |grep -Pv '^#' >> $MUTECT2_CS ; -eddie

On Wed, Mar 23, 2016 at 3:25 PM, Scott Frazer notifications@github.com wrote:

@tmdefreitas https://github.com/tmdefreitas Yes, that is definitely possible.

However, we try to not make assumptions about the type of characters that your script can have in it. I'm perhaps being a little overly cautious, but I'd hate for there to be a case where somebody wants to use a # in their command but it gets interpreted as a comment. That could lead to the same kind of confusion that we're seeing now.

I vacillate on this because I also see the pragmatism in implementing your suggestion for the common case. In most cases I can think of, a # is a comment

Maybe some approach like Eddie's where I can have the parser give a better error message is the best solution.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/broadinstitute/wdltool/issues/7#issuecomment-200505343

Edward A. Salinas Senior Software Engineer - Getz Lab Broad Institute of MIT and Harvard 75 Ames Street, Rm 4013 Cambridge, MA 02142 Ph: 617-714-7905 Cell: 210-274-3172

tmdefreitas commented 8 years ago

@eddiebroad But those are all quoted strings, and don't look the same as a WDL comment. From an implementation perspective, doesn't cromwell pipe the command block to /bin/bash anyway? And following bash rules unquoted # characters start a comment, so maybe WDL just has to follow the same comment parsing rules as bash?

eddiebroad commented 8 years ago

The # in my command is NOT intended to be as a comment, but is used in the grep command because certain header lines start with # in VCF files.

WDL following bash-like comment rules might be a good idea.

-eddie

On Wed, Mar 23, 2016 at 3:35 PM, Timothy DeFreitas <notifications@github.com

wrote:

@eddiebroad https://github.com/eddiebroad But those are all quoted strings, and don't look the same as a WDL comment. From an implementation perspective, doesn't cromwell pipe the command bock to /bin/bash anyway? And following bash rules unquoted # characters start a comment, so maybe WDL just has to follow the same comment parsing rules as bash?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/broadinstitute/wdltool/issues/7#issuecomment-200510863

Edward A. Salinas Senior Software Engineer - Getz Lab Broad Institute of MIT and Harvard 75 Ames Street, Rm 4013 Cambridge, MA 02142 Ph: 617-714-7905 Cell: 210-274-3172

scottfrazer commented 8 years ago

I'm hesitant to try to follow the same rules as Bash for parsing a command because I worry that it'll be error prone and difficult to pin down the details. It feels like we'd essentially reimplement a Bash parser in WDL. It'd be difficult and hard to get the nuances of something like this:

command {
  python <<CODE
  print("#octothorps!!#")
  CODE
}

Maybe something as simple as syntax highlighting could help too... if #s are not highlighted differently in the command section then that could also be a hint. Granted, I know syntax highlighters won't always be used.

eddiebroad commented 8 years ago

If it's clear to WDL users how # is interpreted (or not interpreted) in the command-block maybe that would be the best thing? Would a sentence/blurb in the docs address this? Possibly as mentioned earlier good error messages?

-eddie

On Wed, Mar 23, 2016 at 3:43 PM, Scott Frazer notifications@github.com wrote:

I'm hesitant to try to follow the same rules as Bash for parsing a command because I worry that it'll be error prone and difficult to pin down the details. It feels like we'd essentially reimplement a Bash parser in WDL. It'd be difficult and hard to get the nuances of something like this:

command { python <<CODE print("#octothorps!!#") CODE }

Maybe something as simple as syntax highlighting could help too... if #s are not highlighted differently in the command section then that could also be a hint. Granted, I know syntax highlighters won't always be used.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/broadinstitute/wdltool/issues/7#issuecomment-200514792

Edward A. Salinas Senior Software Engineer - Getz Lab Broad Institute of MIT and Harvard 75 Ames Street, Rm 4013 Cambridge, MA 02142 Ph: 617-714-7905 Cell: 210-274-3172

mcovarr commented 6 years ago

Issue moved to broadinstitute/cromwell #2870 via ZenHub