bioboxes / rfc

Request for comments on interchangeable bioinformatics containers
http://bioboxes.org
MIT License
40 stars 9 forks source link

The biobox signature should be used to generate the json-schema document. #125

Open michaelbarton opened 9 years ago

michaelbarton commented 9 years ago

My preference for the signatures is because they are succinct way of describing the inputs and outputs. The fit the value of a biobox as a function to transform a set of inputs to a set of outputs. At present the signature is not used to validate the input data. I believe we should generate the json-schema document from the signature. This will free us from having to write a json-schema for each new biobox, and thereby allow biobox authors to define their own inputs and outputs.

michaelbarton commented 9 years ago

I've started work on a tool that could parse the biobox signature into the json-schema. If possible, this would make developing a new biobox type simpler as we would not need to write a new schema file each time.

michaelbarton commented 9 years ago

I have been experimenting with the haskell parsec library which seems to be useful for parsing plain-text into data. I've built a parser that validates the signature, and generates a YAML document. This is still a prototype but this recent commit illustrates an error if the signature is invalid. The interface is:

./signature-validator --signature "Fastq A -> Fastq A" --schema input

The schema flag is used to select whether the input or output schema file is desired. How does this approach seem? Haskell might not be an ideal choice since it is not a common language but building the parser was relatively simple to create.

michaelbarton commented 9 years ago

I've built a working prototype in the repo "bioboxes/signature-validator". Combining this with the "validate-biobox-file" tool allows the biobox.yml to be validated with only the biobox signature needed. I've created a cucumber feature outlining this.