broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1k stars 359 forks source link

json containing tabs is rejected #3487

Open EvanTheB opened 6 years ago

EvanTheB commented 6 years ago

I was running through the tutorial, https://cromwell.readthedocs.io/en/develop/tutorials/ServerMode/. I typed the inputs.json file with a tab instead of spaces.

curl -X POST http://localhost:8000/api/workflows/v1 -F workflowSource=@hello.wdl -F workflowInputs=@inputs.json

{
  "status": "fail",
  "message": "Error(s): Input file is not valid yaml nor json: while scanning for the next token\nfound character '\\t(TAB)' that cannot start any token. (Do not use \\t(TAB) for indentation)\n in 'reader', line 2, column 1:\n    \t\"test.hello.name\": \"World\"\n    ^\n"
}

However per the JSON spec, tabs, or any whitespace, "can be inserted between any pair of tokens" https://www.json.org/.

Python:

json.loads("{\n\t\"valid\":\"json\"\n}")

Sadly I do not know Java very well or I would just check and or fix whichever parser you are using.

Thanks

geoffjentry commented 6 years ago

Thanks @EvanTheB - we'll take a look at this

geoffjentry commented 6 years ago

For the person (even if it's me) who picks this up -> I did a quick poke around and I think the reason for this is that our original validation in PartialWorkflowSources appears to be getting run through a YAML parser and tabs aren't valid YAML, but are ok in JSON.

geoffjentry commented 6 years ago

Another few thoughts to the person who picks this up -> this bug likely also exists on whatever initial validation is being done for CWL files as they can also be json but presumably are being run through a yaml parser

EvanTheB commented 6 years ago

YAML claims to be a superset of JSON, but it is not (as this proves). Every time I have encountered YAML in a project it has been a headache, just my 2c. It is a strange syntax, with the potential for infinite recursion and bombs. It makes for a hard parser implementation, and there have been security and perfomance issues in the past. Fine for local configuration files, but probably not a good idea for any public facing APIs.

EvanTheB commented 6 years ago

Indeed I just pushed a recursive yaml file to the same endpoint, and the server crashed: yaml.json: {a: &b [*b]}

geoffjentry commented 6 years ago

@EvanTheB The main issue here is that we need to support YAML for Cromwell's CWL support. It looks like when that went in there were a few spots where the proverbial pair of birds was killed with a single stone a bit overeagerly.

That's an interesting example however. Food for thought.

EvanTheB commented 6 years ago

further reading: https://github.com/cblp/yaml-sucks/