Swirrl / table2qb

A generic pipeline for converting tabular data into rdf data cubes
Eclipse Public License 1.0
13 stars 4 forks source link

Issue 69 - Make dataset, code and codelist URIs configurable #103

Closed lkitching closed 4 years ago

lkitching commented 5 years ago

69

RickMoynihan commented 4 years ago

@lkitching Just seen this PR it appears to let us configure the URI's generated by table2qb a bit better; but I don't fully understand what it lets you do or exactly how it's intended to be used?

lkitching commented 4 years ago

@RickMoynihan - It does a few things:

  1. Centralises the definitions of the URI structures produced by each pipeline using a small meta-template language
  2. Adds a uris task for displaying the definitions for each pipeline
  3. Lets you override specific templates when running each pipeline to customise generated URIs

I think this is mostly described in the usage.md changes but I can add more detail if it's unclear.

RickMoynihan commented 4 years ago

@lkitching: What is the difference between the$(...) and {...} syntax in the URI templating language? Is there a reason we can't use the existing URI templating system used by csv2rdf?

Robsteranium commented 4 years ago

If I understand this correctly @RickMoynihan, the meta-templating language exists so that we can use variables in URIs that aren't available at the point the templates are applied - e.g. the base-uri isn't available as a column in any of the csvw data tables.

I suppose we could use the same templating language and just have the templates partially-applied before they reach csvw (obviating the need for a meta language). The advantage of the meta language though is that it's clear from the template where the variables are coming from and it also let's us use clojure symbols that aren't otherwise permissible in templates (e.g. my-var).

Robsteranium commented 4 years ago

This is looking great so far.

Robsteranium commented 4 years ago

I've added an example and renamed a few things for consistency.

I've also made notation an optional column in the components-pipeline (0b2c498).

Robsteranium commented 4 years ago

As per my comment on the original issue, this PR stops short of making observation-uri configurable but I don't think that we need to implement that right now. Its worth merging this just to get configurable codes and components.

Robsteranium commented 4 years ago

We might like to tackle a naming issue before merging this: should we rename URI templates from <type>-uri to <type>-template?

The original templates are named like <type>-template. These are URI templates in the RFC6570 sense, populated within the csvw translation. The ones introduced here are named like <type>-uri. These are meta-templates with access to variables outside of the csvw. Do we want to use the term 'template' for both? We might also rename the --uris-file argument accordingly (e.g. --templates-edn?).

I don't think it's a major problem to leave it how it is, but the longer we leave it the harder it will be to change.

lkitching commented 4 years ago

Would it be enough to rename the --uris-file to --uri-templates-file or --uri-templates? The keys currently identify the item in the output they template e.g. :dataset-uri is the template for the dataset URI. This will be lost if it's renamed to :dataset-template.

Robsteranium commented 4 years ago

Yeah, that makes a lot of sense. I'll do that now!