It'd be nice to use the new csvw and exec tasks from #120 to regenerate the examples. This will help keep them up-to-date and also serves as a test in it's own right (firstly whether they can be generated successfully and secondly because the examples are used by the test suite).
This PR provides an example script. This involves a few changes:
pipeline outputs are renamed as they otherwise clobber each other's "metadata.json" files
provision for having the csvw:url specified relatively so the checked-in examples don't contain references to absolute paths on the machine that generated them (see util/csvw-url)
introduce a basic script that calls the uberjar to generate the csvw for the regional-trade example
Outstanding issues for discussion
Does util/csvw-url work in a reasonable way?
csv2rdf expects the csvw:url to specified in one of three ways:
file:/absolute/path/input.csv
input.csv (relative to the location of the json metadata file)
file:./relative/path/input.csv (relative to JVM working dir - so you would need to call table2qb and csv2rdf from the same place)
If the output-directory parameter is specified absolutely then the first approach is taken, if relatively then the second. Arguably, the third (file:./relative/path/input.csv) would be more consistent (also having a "file" schema), however java.net.URI doesn't seem to like creating relative URIs for files (even though csv2rdf happily reads this).
Moreover explicit configuration might be preferred as a convention like this might be a bit magic/ mysterious.
In any case we ought to document this behaviour somewhere.
How should we call table2qb to generate the examples?
I think it's appealing to use the CLI to do this as means we're exercising that interface. Shell scripts mightn't be ideal, particularly once we extend this to all of the examples. Further we rely on developers remembering to run the scripts and commit the changes when the inputs to the examples change.
An alternative would be specify the calls as part of the test suite. It would be more convenient to specify lots of examples/ parameters with data structures in clojure. We can also hook into the test cycle so that the examples are regenerated every time the suite runs. We'd loose the checks we otherwise having on the CLI interface.
Further requirements for implementation
[ ] extend the generation tasks to all examples
[ ] generate turtle as well as csvw
[ ] ensure the re-generation runs before the test suite (on CI at least)
It'd be nice to use the new csvw and exec tasks from #120 to regenerate the examples. This will help keep them up-to-date and also serves as a test in it's own right (firstly whether they can be generated successfully and secondly because the examples are used by the test suite).
This PR provides an example script. This involves a few changes:
csvw:url
specified relatively so the checked-in examples don't contain references to absolute paths on the machine that generated them (seeutil/csvw-url
)Outstanding issues for discussion
Does
util/csvw-url
work in a reasonable way?csv2rdf expects the
csvw:url
to specified in one of three ways:file:/absolute/path/input.csv
input.csv
(relative to the location of the json metadata file)file:./relative/path/input.csv
(relative to JVM working dir - so you would need to call table2qb and csv2rdf from the same place)If the
output-directory
parameter is specified absolutely then the first approach is taken, if relatively then the second. Arguably, the third (file:./relative/path/input.csv
) would be more consistent (also having a "file" schema), howeverjava.net.URI
doesn't seem to like creating relative URIs for files (even though csv2rdf happily reads this).Moreover explicit configuration might be preferred as a convention like this might be a bit magic/ mysterious.
In any case we ought to document this behaviour somewhere.
How should we call table2qb to generate the examples?
I think it's appealing to use the CLI to do this as means we're exercising that interface. Shell scripts mightn't be ideal, particularly once we extend this to all of the examples. Further we rely on developers remembering to run the scripts and commit the changes when the inputs to the examples change.
An alternative would be specify the calls as part of the test suite. It would be more convenient to specify lots of examples/ parameters with data structures in clojure. We can also hook into the test cycle so that the examples are regenerated every time the suite runs. We'd loose the checks we otherwise having on the CLI interface.
Further requirements for implementation