Open michaelbarton opened 9 years ago
I've started work on a tool that could parse the biobox signature into the json-schema. If possible, this would make developing a new biobox type simpler as we would not need to write a new schema file each time.
I have been experimenting with the haskell parsec library which seems to be useful for parsing plain-text into data. I've built a parser that validates the signature, and generates a YAML document. This is still a prototype but this recent commit illustrates an error if the signature is invalid. The interface is:
./signature-validator --signature "Fastq A -> Fastq A" --schema input
The schema flag is used to select whether the input or output schema file is desired. How does this approach seem? Haskell might not be an ideal choice since it is not a common language but building the parser was relatively simple to create.
I've built a working prototype in the repo "bioboxes/signature-validator". Combining this with the "validate-biobox-file" tool allows the biobox.yml to be validated with only the biobox signature needed. I've created a cucumber feature outlining this.
My preference for the signatures is because they are succinct way of describing the inputs and outputs. The fit the value of a biobox as a function to transform a set of inputs to a set of outputs. At present the signature is not used to validate the input data. I believe we should generate the json-schema document from the signature. This will free us from having to write a json-schema for each new biobox, and thereby allow biobox authors to define their own inputs and outputs.