Open kcoyle opened 3 years ago
And URL prefixes?
Would it help to base such a manifest on the CSV on the Web metadata JSON file? That covers alternative column headings.
I do think that we should have the CSV on the web JSON as one of the options. Can anyone here mock that up?
@kcoyle will do.
@philbarker Thanks, Phil. Also, take a look at the example that Nishad did on #3. He based on that CSVW but used a table format for the data. Perhaps these two could be somewhat parallel to show both methods?
@bencomp Thanks. We are looking at CSV on the Web for this, among other solutions, but are also wanting to develop a table-based option (since CSVW uses Json). We figure that we will present multiple options as examples of how one might encode a manifest. Note that the only aspect of a manifest that seems to be absolutely necessary for TAP is the definition of prefixes to used namespaces. This means that we'll probably have a range of examples beyond that single requirement, but may not specify a single TAP manifest.
@kcoyle Thanks, I just arrived at this repository and had missed the extensive mentions of CSVW in #3. If I understand correctly, an [application] profile is a way of expressing what makes a resource description valid. The manifest aims to minimally prescribe the mappings for translating CURIEs to IRIs and (optionally) describe the profile.
If the resource description were expressed in RDF (whatever the serialisation), I would look at SHACL (or ShEx) to define the valid shapes. The SHACL would serve as the profile. To describe the profile, i.e. collection of SHACL shapes, I would again use RDF and it could go in the same file as the SHACL or it could be a separate resource. I think all RDF serialisations have ways of mapping CURIEs to IRIs, so that question would not be an issue.
If the profile were expressed in CSV and I wanted to interpret the values as RDF to validate RDF resource descriptions, I would want to convert the CSV to SHACL or ShEx. Several conversions exist, including CSV on the Web metadata (CSVM) and RML. Both CSVM and RML are expressed in RDF, so they could include the manifest.
I think in a CSVM file you could specify a column's datatype
to be { "@id": "http://www.w3.org/1999/xhtml/datatypes/CURIE" }
to indicate its values are CURIEs, but I don't expect existing processors to understand what to do with that. TAP processors could be made aware and instructed to convert CURIEs to IRIs using, e.g., the mappings in the @context
in the CSVM.
All I really wanted to say is: are you sure you're not reinventing the wheel? CSVM ticks all the boxes for the list at the top:
And more:
If you definitely want to express the manifest as CSV to use as rules for validating the CSV profile, you're doing something that I haven't seen before. That doesn't mean it shouldn't be done, of course. You could reverse engineer the CSVM vocabulary to fit in CSV and provide a default mapping to RDF (I'd suggest to make it a CSVM file), along with instructions to derive your own CSVM file. A disadvantage of using CSV for elaborate schemas is that you could end up with
subject,predicate,object
so that it can hold any data.I don't want to suggest that every issue has been solved already, but I do see some overlap with existing standards and initiatives and hope that these are considered when looking for solutions for TAPs.
@bencomp Thanks for your detailed comment. It inspired me to spend the weekend doing a close reading of the CSVW documents. There is, as you mention, the meta problem: CSVW is for data in tabular format, and TAP is a tabular format for a profile describing metadata choices for metadata in any format. It's the profile that is tabular, so CSVW would only apply to the profile itself. I believe that this means that there will be some features of CSVW that we might use, but others may not be appropriate. For example, the CSVW ordered
values in cells will only apply to tabular data; the request in #14 was to designate the order of property/value pairs in the RDF metadata the TAP defines. Order of values in cells in a TAP is probably an edge case (I can't think of an example where this would be needed).
We should definitely look at CSVW to see what it can offer for some functions:
(Note, I haven't found a way to provide alternate text for booleans in CSVW - would appreciate a tip for that.)
There may be other features as well. However, it looks to me that CSVW is quite a bit more complex than we want to embrace for the small number of needs we have. For example, it isn't clear to me if a CSVW description of the TAP vocabulary would be useful. It would define basic validation for created TAP columns but we've been leaving things pretty loose so I'm not sure how much validation of that type is needed. Each created TAP would need a different CSVW file to solve our needs, and that is more complicated than we've embraced so far with TAP. Of course, anyone who has the skills and wants to create a CSVW annotation for their TAP is welcome to do so.
Ideally, the manifest would be something very simple that people could encode in a spreadsheet, and that would be transformed to a more usable format by the program that ingests the CSV.
@kcoyle wrote:
(Note, I haven't found a way to provide alternate text for booleans in CSVW - would appreciate a tip for that.)
It's here https://www.w3.org/TR/tabular-data-primer/#boolean-format
This is Nishad's suggestion, from #3
I liked the tabular way of expressing prefixes proposed by Karen, not so sure if we need to use a prefixed header for that. Probably the ontology is the best place to include any such mapping.
CSVW proposes a JSON metadata file relative to the path of the CSV, something like path/file.csv-metadata.json
Other than declaring prefixes, as per the DCAP Guidelines [1] 5.1 encourages declaring a set of metadata for the application profiles.
Not as a proposal, but I am curious if both of these can be implemented in a tabular format, as we are primarily focusing our interests on a tabular representation of application profiles.
The following is not a proposal but exploring the possibilities for discussing such a metadata file in tabular format. Headers are fictional, used as an example.
A metadata file with prefixes for the DCTAP path/foo-dctap.csv
can be path/foo-dctap.csv-metadata.csv
. Which can be :
MetadataID | MetadataType | Value | Notes |
---|---|---|---|
dct | Namespace | http://purl.org/dc/terms/ | DublinCore Terms |
schema | Namespace | http://schema.org/ | Schema.org |
dcat | Namespace | http://www.w3.org/ns/dcat# | DCAT Vocabulary |
dct:title | LITERAL | Foo Application Profile | |
dct:creator | IRI | https://orcid.org/0000-0000-1111-0000 | |
dct:creator | IRI | https://orcid.org/0000-0000-1111-1111 | |
dct:contributor | IRI | https://orcid.org/0000-0000-1111-2222 | |
dct:license | IRI | https://opendatacommons.org/licenses/odbl/1.0/ | |
dct:description | LITERAL | A human readable description | |
dcat:downloadURL | IRI | https://zenodo.org/record/xxxx/files/xxx/data-v1.0.0.zip?download=1 | |
dcat:distribution | IRI | https://doi.org/10.5281/zenodo.xxxx |
[1] https://www.dublincore.org/specifications/dublin-core/application-profile-guidelines/
I also tried to create a table and did a myriad of different versions, none of which seemed to work.
There seem to be 5 different kinds of data statements we need to make:
It isn't easy getting these all into one table. Nishad covered 1 and 2. My table below tries to cover them all. Note that in my table
ID | property | datatype | prefix | language | display options | Boolean |
---|---|---|---|---|---|---|
http://purl.org/dc/terms/ | dct | |||||
http://schema.org/ | sdo | |||||
http://www.w3.org/2001/XMLSchema# | xsd | |||||
:MyTap123 | dc:title | xsd:string | ||||
:MyTap123 | dc:publisher | xsd:string | ||||
:MyTap123 | dc:modified | xsd:date | ||||
tap:shapeID | en | entity | ||||
tap:shapeID | it | entità |
Note: if we like having the URI in the ID column and the prefix in a later column in the row, we might want to also reverse these in our "prefixes only" solution, and also use the same column headers (whatever they turn out to be).
Here's an example of how we could use the CSV on the Web JSON-LD format to fulfill this requirement. It's quite long so I'll put my comments first:
@context
, but that seems like a slightly odd use of the @context
.names
as titles
for humans in a choice of languageSo I think the only [potential] shortcoming relates to multilingual alternatives to the boolean defaults.
{
"@context": {
"@import": "http://www.w3.org/ns/csvw",
"sh": "http://www.w3.org/ns/shacl#"
},
"dc:title": "Credential Engine Registry Application Profile",
"dc:description": "Describes the minimum data policy for publishing to the Credential Engine Registry",
"dc:creator": "https://credentialengineregistry.org/resources/ce-9bd8c615-9f3c-40e6-9c20-6d9f811844e6",
"sh:declare": [
{
"sh:prefix": "rdf",
"sh:namespace": "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
},
{
"sh:prefix": "rdfs",
"sh:namespace": "http://www.w3.org/2000/01/rdf-schema#"
},
{
"sh:prefix": "xsd",
"sh:namespace": "http://www.w3.org/2001/XMLSchema#"
},
{
"sh:prefix": "ceterms",
"sh:namespace": "https://purl.org/ctdl/terms/"
},
{
"sh:prefix": "agentSector",
"sh:namespace": "https://purl.org/ctdl/vocabs/agentSector/"
}
],
"tableSchema": {
"columns": [
{
"name": "shapeID",
"titles": {
"en": "Shape ID",
"es": "ID de Forma"
},
"datatype": "anyURI"
},
{
"name": "propertyID",
"titles": {
"en:": "Property ID",
"es": "ID de properdad"
},
"datatype": "anyURI"
},
{
"name": "propertyLabel",
"titles": {
"en": "Property Label",
"es": "Etiqueta de properdad"
},
"datatype": "string"
},
{
"name": "madatory",
"titles": {
"en": "Mandatory",
"es": "Obligatoria"
},
"datatype": {
"base": "boolean",
"format": "Yes|No"
}
},
{
"name": "repeatable",
"titles": {
"en": "Repeatable",
"es": "Repetible"
},
"datatype": {
"base": "boolean",
"format": "Yes|No"
}
},
{
"name": "valueNodeType",
"titles": {
"en": "Value node type",
"es": "Tipo de nodo"
},
"datatype": {
"base": "string",
"format": "IRI|Literal|BNODE"
}
},
{
"name": "valueDataType",
"titles": {
"en": "Value data type",
"es": "Tipo de datos"
},
"datatype": "anyURI"
},
{
"name": "valueConstraint",
"titles": {
"en": "Value constraint",
"es": "Restricción para valores"
},
"datatype": "string"
},
{
"name": "valueConstraintType",
"titles": {
"en": "Value constraint type",
"es": "Tipo de restricción"
},
"datatype": "string"
},
{
"name": "valueShape",
"titles": {
"en": "Value shape",
"es": "Forma para valores"
},
"datatype": "string"
},
{
"name": "note",
"titles": {
"en": "Notes",
"es": "Anotaciones"
},
"datatype": "string"
}
]
},
"tables": [
{
"url": "http://example.org/CE_CredentialOrg_required.csv",
"dc:title": "Required properties of a Credential Organization"
},
{
"url": "http://example.org/CE_CredentialOrg_recommended.csv",
"dc:title": "Recommended properties of a Credential Organization"
},
{
"url": "http://example.org/CE_Credential_required.csv",
"dc:title": "Required properties of a Credential"
},
{
"url": "http://example.org/CE_Credential_recommended.csv",
"dc:title": "Recommended properties of a Credential"
}
]
}
Here is an almost equivalent YAML representation of @philbarker's JSON-LD manifest.
# DCTAP Manifest
prefixes:
dc: 'http://purl.org/dc/elements/1.1/'
rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
rdfs: 'http://www.w3.org/2000/01/rdf-schema#'
xsd: 'http://www.w3.org/2001/XMLSchema#'
ceterms: 'https://purl.org/ctdl/terms/'
agentSector: 'https://purl.org/ctdl/vocabs/agentSector/'
metadata:
'dc:title': Credential Engine Registry Application Profile
'dc:description': Describes the minimum data policy for publishing to the Credential Engine Registry
'dc:creator': https://credentialengineregistry.org/resources/ce-9bd8c615-9f3c-40e6-9c20-6d9f811844e6
tableSchema:
columns:
- name: shapeID
title:
en: Shape ID
es: ID de Forma
datatype: anyURI
- name: propertyID
title:
en: Property ID
es: ID de properdad
datatype: anyURI
- name: propertyLabel
title:
en: Property Label
es: Etiqueta de properdad
datatype: string
- name: madatory
title:
en: Mandatory
es: Obligatoria
datatype:
base: boolean
format: Yes|No
- name: repeatable
title:
en: Repeatable
es: Repetible
datatype:
base: boolean
format: Yes|No
- name: valueNodeType
title:
en: Value node type
es: Tipo de nodo
datatype:
base: string
format: IRI|Literal|BNODE
- name: valueDataType
title:
en: Value data type
es: Tipo de datos
datatype: anyURI
- name: valueConstraint
title:
en: Value constraint
es: Restricción para valores
datatype: string
- name: valueConstraintType
title:
en: Value constraint type
es: Tipo de restricción
datatype: string
- name: valueShape
title:
en: Value shape
es: Forma para valores
datatype: string
- name: note
title:
en: Notes
es: Anotaciones
datatype: string
tables:
- url: 'http://example.org/CE_CredentialOrg_required.csv'
'dc:title': Required properties of a Credential Organization
- url: 'http://example.org/CE_CredentialOrg_recommended.csv'
'dc:title': Recommended properties of a Credential Organization
- url: 'http://example.org/CE_Credential_required.csv'
'dc:title': Required properties of a Credential
- url: 'http://example.org/CE_Credential_recommended.csv'
'dc:title': Recommended properties of a Credential
---
I have been using a java CSVW-based validator to check how the CSVW JSON metadata above works in practice (and indeed whether it is valid) and whether we might use it to check our test cases. There's amended code below, but first comments.
"@context": "http://www.w3.org/ns/csvw"
that I tried for the context lead to an error. I see that as a limitation of the validator not the format, but it meant that I had to take out the SHACL block that declared the namespaces in order to get anything that would work with this tool."required": false
|"required": true
but this just means that a row need need not have data in that column. It would be worth checking whether other validators interpret the spec in this way, but if so it would mean that even the simplest "list of properties" application profile would have to have all columns.name
of a column isn't used as one of the valid variants for its heading, so I had to add the unspaced column names to the English column titles.In conclusion, as a format for providing metadata I think this is still an option (but we need simple options too), but I had hoped to be able to create a generic file that could validate any TAP and that doesn't seem possible.
{
"@context": "http://www.w3.org/ns/csvw",
"url": "tap4.csv",
"dc:title": "Credential Engine Registry Application Profile",
"dc:description": "Describes the minimum data policy for publishing to the Credential Engine Registry",
"dc:creator": "https://credentialengineregistry.org/resources/ce-9bd8c615-9f3c-40e6-9c20-6d9f811844e6",
"tableSchema": {
"columns": [
{
"name": "shapeID",
"titles": {
"en": [
"shapeID",
"Shape ID"
],
"es": "ID de Forma"
},
"datatype": "string",
"required": false
},
{
"name": "propertyID",
"titles": {
"en": [
"propertyID",
"Property ID"
],
"es": "ID de properdad"
},
"datatype": "anyURI",
"required": true
},
{
"name": "propertyLabel",
"titles": {
"en": [
"propertyLabel",
"Property Label"
],
"es": "Etiqueta de properdad"
},
"datatype": "string"
},
{
"name": "madatory",
"titles": {
"en": [
"mandatory",
"Mandatory"
],
"es": "Obligatoria"
},
"datatype": {
"base": "boolean",
"format": "Yes|No"
}
},
{
"name": "repeatable",
"titles": {
"en": [
"repeatable",
"Repeatable"
],
"es": "Repetible"
},
"datatype": {
"base": "boolean",
"format": "Yes|No"
}
},
{
"name": "valueNodeType",
"titles": {
"en": [
"valueNodeType",
"Value node type"
],
"es": "Tipo de nodo"
},
"datatype": {
"base": "string",
"format": "IRI|BNODE|Literal|IRI BNODE|IRI Literal|BNODE Literal|IRI BNODE Literal"
}
},
{
"name": "valueDataType",
"titles": {
"en": [
"valueDataType",
"Value data type"
],
"es": "Tipo de datos"
},
"datatype": "anyURI"
},
{
"name": "valueConstraint",
"titles": {
"en": [
"valueConstraint",
"Value constraint"
],
"es": "Restricción para valores"
},
"datatype": "string"
},
{
"name": "valueConstraintType",
"titles": {
"en": [
"valueConstraintType",
"Value constraint type"
],
"es": "Tipo de restricción"
},
"datatype": "string"
},
{
"name": "valueShape",
"titles": {
"en": [
"valueShape",
"Value shape"
],
"es": "Forma para valores"
},
"datatype": "string"
},
{
"name": "note",
"titles": {
"en": [
"note",
"Notes"
],
"es": "Anotaciones"
},
"datatype": "string"
}
]
}
}
I had hoped to be able to create a generic file that could validate any TAP and that doesn't seem possible.
Thanks, Phil. This answers a question that I had as well - a generic TAP validator. I guess we'll have to do our own. (hint, hint)
I edit this to keep it up to date with the ideas in the thread.
Some possible data for the manifest:
Includes Tom's list of manifest items from #41 : added to the above