Open canwaf opened 1 year ago
Cheers for this, will write here about the bits where I have a different view!
@id
user should be able to provide an absolute identifier
Think I agree (doesn't the example have this?) - will check when you're back!
dcterms:title
(m),dcterms:description
(o),rdfs:comment
(o) should be part of the CSV-W distrbution
I think the CSV(W) distribution has a dcterms:title
and dcterms:description
but they were the title relating to that specific distribution (so may include the filetype for example). So I think I'm disagreeing with you - I think the resource which holds the "main" title and "main" description is the dcat:Dataset
.
I could see us offering different distributions than just CSV (JSON in particular), so that's why I think we ought to respect the dcat:Dataset
being the main resource.
I didn't include rdfs:label
for DCAT resources and wrote a bit about my thinking in the profile here.
dcat:mediaType should be csvw not csv
I think the CSVW metadata file (xxx.csv-metadata.json
) would have a MIME type of application/csvm+json
but the CSV file itself would have a MIME type of text/csv
. So disagreeing.
The spec mentions giving the metadata file that MIME type here.
And an example:
{
"@context": "http://www.w3.org/ns/csvw",
"url": "countries.csv",
"dcat:mediaType": {
"@id": "https://www.iana.org/assignments/media-types/text/csv"
},
"wdrs:describedby": {
"@id": "http://data.gov.uk/series/greenhouse-gas-emissions/dataset/2018.csv-metadata.json",
"dcat:mediaType": {
"@id": "https://www.iana.org/assignments/media-types/application/csvm+json"
}
}
}
columns is fine but urls should be relative until crystalised at a later time
I think whether things are absolute/relative is more of a csvcubed
implementation. I think in the profile I'm trying to be explicit about what the underlying RDF should be and what the URIs should look like, if we can use relative URIs to do that, then great.
It would be helpful to have an option where we can set what the start of the URI should be if we wanted to create absolute URIs in csvcubed
.
the metadata should be held by the catalogue service so it shouldn't contain information about the data set.
I get we could have some separate workflow for this but don't really know why we'd start with that. I recognise it's verbose to deal with dcat:Dataset
s and their dcat:Distribution
s but users will think of their CSV of data as the same as a dataset they're producing so for now I reckon we just create the metadata for both at the same time.
Unless it known in advance, we shouldn't set the parent data set's ID
I think, for our own use of this stuff, we'll know them in advance.
The use of fixed uris for
@id
ofdcat:Dataset
,dcat:Distribution
,qb:DataSet
should be confirmed that this user-provided only otherwise relative@ids
will can be used.
Yeah I think I agree, needs to be user provided.
convience perspective we should have triples which use qb:componentProperty
Don't think I agree, this makes the CSVW more verbose whereas I think the use case you're imagining is solved easily enough by writing SPARQL which is slightly more verbose.
Metadata/catalogue data
@id
user should be able to provide an absolute identifier, we can create a relative one (if unspecified)dcterms:title
(m),dcterms:description
(o),rdfs:comment
(o) should be part of the CSV-W distrbution, 1 of each type/langage maximum, markdowndcterms:description
supported but only oncedcat:mediaType
should be csvw not csvColumns
General
This CSV-W should be a distribution, the metadata should be held by the catalogue service so it shouldn't contain information about the data set.
We like saying that the CSV-W is a distribution, where we can have an interdeterminate data set with this CSV-W being a distribution of it. Unless it known in advance, we shouldn't set the parent data set's ID. This improves portability, and doesn't supplant the work from the cataloguing service(s) -- two publisher one distribution.
The use of fixed uris for
@id
ofdcat:Dataset
,dcat:Distribution
,qb:DataSet
should be confirmed that this user-provided only otherwise relative@id
s will can be used.@id
s should be generated from a base path.(Workflow idea: you go to the catalog service, coin the dataset, and download the template with these values already filled in for you.)
The spatial and temporal range information should be duplicated across the
qb:DataSet
anddcat:Dataset
; for finding it on the catalogue, but alsoqb:DataSet
to be able to interpret the CSV-W independently.For convience perspective we should have triples which use
qb:componentProperty
to link component specifications to the component properties. (In addition toqb:dimension/qb:attribute/qb:measure
predicates already present.)Components stuff