Swirrl / table2qb

A generic pipeline for converting tabular data into rdf data cubes
Eclipse Public License 1.0
13 stars 4 forks source link

Property Based Testing & Specs #91

Open RickMoynihan opened 5 years ago

RickMoynihan commented 5 years ago

This is partly a suggestion of how we should do a large part of the validation task. We may not be able to express all the validations easily this way, so it’s not necessarily a complete approach. However it will unlock many other benefits too.

The proposal is that we should add clojure.spec’s which will serve the following purposes:

This will also be a step towards being able to generate randomized but valid cube data for testing all aspects of an RDF/cube stack.

This approach may be required to run deeper than just table2qb, for example we may want the RDF specs to be in grafter.

Robsteranium commented 5 years ago

I think this should be straightforward enough for the columns-config and for the codelist and component pipelines where table2qb sets the schema. How do you think it might work with the cube-pipeline, where columns-config defines a set of permissible columns (which are essentially all optional for any given table)? Indeed, would the generated data involve a random set of component-properties taken from the columns config?

I'd also be curious to know if we could build the specs from the existing config/ conventions or whether the user would need to provide more information to support this.

I'd suggest we start exploring this with a spec for the columns-config, codelist-csv and components-csv then take it from there...