Open canwaf opened 6 months ago
Currently, cubes that have been built using csvcubed v0.4.10 or lower cannot be inspected using csvcubed v0.5.0 or greater, as the primary identifier has changed from some-dataset.csv#dataset
to some-dataset.csv#csvqb
. In order to facilitate this change, a new distribution_uri
property has been added to the CatalogMetadata
class, and the select_catalog_metadata
SPARQL query has been updated to extract the value of this property, if it is present.
Additional information on the version of csvcubed used to build the cube is also now available in the metadata JSON file, which may also be leveraged to determine how the cube should be inspected.
The distribution_uri
value is not present in cubes built using older versions of csvcubed, so the inspect
command fails if using a newer version of csvcubed. This is due to the MetadataPrinter
class now using the distribution_uri
in the get_primary_csv_url()
method via DataCubeRepository.get_cube_identifiers_for_dataset()
. There will be other places where there is a discrepancy, but this is where I would start.
csvcubed-build-activity
information to extract the version of csvcubed used to build the cube, and use this to implement different versions of the inspect
command. Build activity information available in different versions of csvcubed is below.distribution_uri
in the select_catalog_metadata
SPARQL results to implement different versions of the inspect
command....
{
"@id": "aged-16-to-64-years-level-3-or-above-qualifications.csv#dataset",
"http://www.w3.org/ns/prov#wasGeneratedBy": [
{
"@id": "aged-16-to-64-years-level-3-or-above-qualifications.csv#csvcubed-build-activity"
}
]
}
...
{
"@id": "aged-16-to-64-years-level-3-or-above-qualifications.csv#csvcubed-build-activity",
"@type": [
"http://www.w3.org/2000/01/rdf-schema#Resource",
"http://www.w3.org/ns/prov#Activity"
],
"http://www.w3.org/ns/prov#used": [
{
"@id": "https://github.com/GSS-Cogs/csvcubed/releases/tag/v0.4.10"
}
]
}
...
...
{
"@id": "some-title.csv#csvqb",
"http://www.w3.org/ns/prov#wasDerivedFrom": [
{
"@id": "https://github.com/GSS-Cogs/csvcubed/releases/tag/v0.5.0"
}
],
"http://www.w3.org/ns/prov#wasGeneratedBy": [
{
"@id": "some-title.csv#csvcubed-build-activity"
}
]
}
...
{
"@id": "some-title.csv#csvcubed-build-activity",
"@type": [
"http://www.w3.org/ns/prov#Activity",
"http://www.w3.org/2000/01/rdf-schema#Resource"
],
"http://www.w3.org/ns/prov#used": [
{
"@id": "https://github.com/GSS-Cogs/csvcubed/releases/tag/v0.5.0"
}
]
},
{
"@id": "https://github.com/GSS-Cogs/csvcubed/releases/tag/v0.5.0",
"@type": [
"http://www.w3.org/ns/prov#Entity",
"http://www.w3.org/2000/01/rdf-schema#Resource"
],
"http://purl.org/dc/terms/title": [
{
"@language": "en",
"@value": "csvcubed v0.5.0"
}
],
"http://www.w3.org/ns/prov#hasPrimarySource": [
{
"@id": "https://pypi.org/project/csvcubed/0.5.0"
}
],
"http://www.w3.org/ns/prov#wasGeneratedBy": [
{
"@id": "some-title.csv#csvcubed-build-activity"
}
]
}
With yanked csvcubed 0.5.0 we adopted the following change to the object model.
This impacts csvcubed's inspect command, which calls https://github.com/GSS-Cogs/csvcubed/blob/main/src/csvcubed/inspect/sparql_handler/sparql_queries/select_catalog_metadata.sparql which primarily looks for the
dcat:Dataset
Which is no longer present; however it should be present. Consider the application profile where the CSV-W is the distribution. This leads us to the following:
So the catalogue metadata is attached to the dataset, but the CSV-W's primary subject is now the
Attachable
,qb:Dataset
, etc.This should allow the SPARQL query to remain unchanged.
The metadata attached to the
dcat:Distribution
should be at most (Not these are not requirements, just what we can fill in that we already have we should add, nothing new new please):tl;dr main subject of the CSV-W metadata file should be
<dataset.csv#csvqb>
which isdcat:isDistributionOf
thedcat:Dataset
. Thedcat:Dataset
is the one which should have the catalogue metadata attached to it.