Swirrl / table2qb

A generic pipeline for converting tabular data into rdf data cubes
Eclipse Public License 1.0
13 stars 4 forks source link

WIP: Automate the generation of the examples #123

Open Robsteranium opened 4 years ago

Robsteranium commented 4 years ago

It'd be nice to use the new csvw and exec tasks from #120 to regenerate the examples. This will help keep them up-to-date and also serves as a test in it's own right (firstly whether they can be generated successfully and secondly because the examples are used by the test suite).

This PR provides an example script. This involves a few changes:

Outstanding issues for discussion

Does util/csvw-url work in a reasonable way?

csv2rdf expects the csvw:url to specified in one of three ways:

If the output-directory parameter is specified absolutely then the first approach is taken, if relatively then the second. Arguably, the third (file:./relative/path/input.csv) would be more consistent (also having a "file" schema), however java.net.URI doesn't seem to like creating relative URIs for files (even though csv2rdf happily reads this).

Moreover explicit configuration might be preferred as a convention like this might be a bit magic/ mysterious.

In any case we ought to document this behaviour somewhere.

How should we call table2qb to generate the examples?

I think it's appealing to use the CLI to do this as means we're exercising that interface. Shell scripts mightn't be ideal, particularly once we extend this to all of the examples. Further we rely on developers remembering to run the scripts and commit the changes when the inputs to the examples change.

An alternative would be specify the calls as part of the test suite. It would be more convenient to specify lots of examples/ parameters with data structures in clojure. We can also hook into the test cycle so that the examples are regenerated every time the suite runs. We'd loose the checks we otherwise having on the CLI interface.

Further requirements for implementation