broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1k stars 360 forks source link

read_json failes on arrays #4518

Open antonkulaga opened 5 years ago

antonkulaga commented 5 years ago

I am struggling to make read_json work on any data that is more complex than a flat json-map.

For instance, a json like

[{"series":"GSE69360","name":"Biochain_Adult_Liver","path":"https://sra-download.ncbi.nlm.nih.gov/traces/sra29/SRR/001967/SRR2014238","model":"Illumina HiSeq 2000","run":"SRR2014238","gsm":"GSM1698568","characteristics":"number of donors -> 1;age -> 64 years old;tissue -> Liver;vendor -> Biochain;isolate -> Lot no.: B510092;gender -> Male","strategy":"RNA-Seq","organism":"Homo sapiens","layout":"PAIRED","title":"Biochain_Adult_Liver"}]

cannot be read by

Array[Map[String,String]] runs = read_json(path_to_json_file)

even through it is clearly Array[Map[String, String]] I get the following failure:

Workflow failed
WorkflowFailure(Failed to evaluate job outputs,List(WorkflowFailure(Bad output 'get_gsm.runs': Failed to read_json("/data/cromwell-executions/test/f8f591dc-3797-46de-9846-dbd2a902ff65/call-get_gsm/execution/GSM1698568_runs.json") (reason 1 of 1): No coercion defined from '[{"series":"GSE69360","name":"Biochain_Adult_Liver","path":"https://sra-download.ncbi.nlm.nih.gov/traces/sra29/SRR/001967/SRR2014238","model":"Illumina HiSeq 2000","run":"SRR2014238","gsm":"GSM1698568","characteristics":"number of donors -> 1;age -> 64 years old;tissue -> Liver;vendor -> Biochain;isolate -> Lot no.: B510092;gender -> Male","strategy":"RNA-Seq","organism":"Homo sapiens","layout":"PAIRED","title":"Biochain_Adult_Liver"}]' of type 'spray.json.JsArray' to 'Object'.,List())))
cjllanwarne commented 5 years ago

Thanks for pointing this out, @antonkulaga - this looks to me like a bug in our draft-2 and 1.0 support.

I think this is probably something that could possibly be improved in a future WDL spec version. IMO ideally we wouldn't have to have a "mixed" return type function (since it plays badly with type-safety,) I think I'd prefer read_json_object and read_json_array, for example - but that would be up to the openWDL group, and this is definitely a bug in our interpretation for now!

aednichols commented 5 years ago

Basically the same issue as https://github.com/broadinstitute/cromwell/issues/4625