Swirrl / pmd-rdf-validations

PMD Validations for cubes etc...
MIT License
3 stars 0 forks source link

Validate comp codelists #8

Closed RickMoynihan closed 3 years ago

RickMoynihan commented 4 years ago

This validation validates an extra PMD constraint that attributes and dimensions used in a dataset should contain at least one qb:codeList triple on the component property itself that identifies the code lists that hold metadata on the component values themselves.

In particular we make use of this in PMD to join labels onto component values, when generating dataset downloads etc, where we need to know the component values ahead of time (to avoid having to join in queries that are repeated many times).

Not including these links will result in downloads that contain URI's rather than labels.

Robsteranium commented 3 years ago

I'm not 100% sure about this one.

Do we really need a codelist? LDF just falls back to searching for a label if a codelist isn't available. Is this not all we need for the download (i.e. no other fields)?

In addition to the case @jennet has raised, we might also consider qb:measureType a false positive (proposal to use the DSD instead of requiring a concept scheme on Muttnik) and I'm not certain codelists add value to attributes anyway (particularly the global sdmxa:unitMeasure).

I wonder if it'd be better to just confirm all dimensions (and not attributes) have codelists per #21.

Can we have the downloads endpoint look-up labels for attributes or does this end-up being too slow?

Robsteranium commented 3 years ago

Rick has explained that this exists to enable more efficient streaming as it lets you look up labels for all the reference data in advance. We might be able to use cacheing as an alternative but having the enumeration in the data and reachable from the cube enables other users to benefit too.

ONS have agreed that we can require codelists on dimensions and attributes (even those defined by sdmx).

Alex has also suggested that we might use owl:DatatypeProperty (vs owl:ObjectProperty) to distinguish attributes with literal values. This would be much more efficient to query against that attempting to look-up all the values of uncoded attributes to check whether they're literals or not.

We're agreed to use the DSD to enumerate and label measures as per Swirrl/muttnik#835.

I'll update the validation in this PR to reflect the above.

Robsteranium commented 3 years ago

I've cherry-picked e7b8dbb2737ca5ea324366d080b74086ed9ef7fb (to avoid resolving whitespace merge conflicts in the edn manifest) to merge this into #24.

In 437ba21942ad9738c730ba8c531a5433f25c644e I update the validation to account for the edge cases discussed above.

Thanks all.