agiorguk / gemini

Resources relating to the UK Gemini metadata profile
5 stars 3 forks source link

DD3 R7. Where possible, add Schema.org “corresponding element” entries to GEMINI elements #41

Open PeterParslow opened 3 years ago

PeterParslow commented 3 years ago

We have used W3C’s recommendations for mapping from ISO 19115 to Schema.org. This table summarises the Schema.org equivalence statements given for each element below. Whilst there is no specific DD2 recommendation concerning DCAT, we believe a DCAT2 “equivalent element” for each GEMINI element would be useful, by supporting those whose web publication of GEMINI records uses DCAT as opposed to Schema.org. Where this is easily available from the same W3C source, we have included this below. You will see that the two vocabularies are very similar, but note that: • some of the DCAT elements sit in the DCAT “distribution” section, not their “dataset”; • many DCAT properties have structured content, so this is not a complete list of how to implement it; and • there are many other DCAT properties that should also be used, beyond those that exist in Schema.org (e.g. conformsTo, creator, spatialResolutionInMeters, format).

GEMINI element Condition Schema.org DCAT/DCAT2[1] Notes
Title name dct:title
Dataset language inLanguage dct:language
Abstract description dct:description
Topic category keywords dct:subject
Keyword INSPIRE theme keywords dcat:theme / dct:subject
Keyword free text keywords dcat:keyword Schema.org puts all the ‘free text’ keywords in one value
Keyword Controlled list, URL Keywords.DefinedTerm.name
Use .description for the textual content of the Anchor or CodeList
Use .url for the target of the Anchor
dcat:keyword.DefinedTerm
Temporal extent temporalCoverage[2] dct:temporal
Dataset reference date 19115 dateType = publication datePublished dct:issued release date / issued
Dataset reference date 19115 dateType = revised dateModified update date / dct:modified
Lineage dct:provenance
Extent spatialCoverage.Place.name dct:spatial
Resource locator.linkage 19115 function = download contentURL (inside “distribution”) dcat:downloadURL
Resource locator.linkage 19115 function = “information”
Where the page links on to download
dcat:accessURL
Resource locator.linkage 19115 function = “information” url dcat:landingPage
Data format encodingFormat dct:format, Possibly also dcat:mediaType
Responsible organisation 19115 role = publisher publisher.Organization (with at least name, email, url) dct:publisher
Responsible organisation 19115 role = pointOfContact contactPoint (probably Organisation, with at least name, email, url) dcat:contactPoint
Use constraints Use constraints is being used to indicate a license license dct:license
Where GEMINI has an Anchor URL to the licence licence.CreativeWork
.abstract (with the free text) and .url (with the Anchor target URL)
Use constraints Other circumstances dct:accessRights
Bounding box spatialCoverage.geo.GeoShape.box dct:spatial Note: needs translating from four edges to two corners
Resource identifier identifier dct:identifier
Resource type rdf:type Note: DCAT-AP does not distinguish between datasets and dataset series
PeterParslow commented 2 years ago

The W3C mapping, on which this is largely based, is at https://www.w3.org/2015/spatial/wiki/ISO_19115_-_DCAT_-_Schema.org_mapping

PeterParslow commented 2 years ago

Andrea Perego’s ISO 19139 - DCAT mapping in GitHub (James’ link) provides more detail e.g. the range of each element, and also maps somethings outside the DCAT namespace(s).

https://github.com/GeoCat/iso-19139-to-dcat-ap/blob/master/documentation/Mappings.md

(Thanks to James Reid)

PeterParslow commented 10 months ago

Just been contacted by the CDDO data standards team looking to state how to describe "where" in DCAT metadata to be used in the UK government data marketplace. This will include updating the mapping above for DCAT v3.

See https://github.com/co-cddo/data-catalogue-schemas/issues/1

nmtoken commented 10 months ago

Should we also adapt the GEMINI mapping to DCAT 3 as this now includes better description of dataset series?

PeterParslow commented 10 months ago

That will be a necessary part of the CDDO work; I'll make sure it is available as an update to this GEMINI change request. It's also being discussed (& likely to happen) in the OGC GeoDCAT SWG.

PeterParslow commented 2 months ago

Need to annotate this to show how it aligns (or not!) with the UK Cross-Government Metadata Exchange Model which may be re-branded as a UK Application Profile of DCAT

archaeogeek commented 2 months ago

@PeterParslow to update table, then @archaeogeek to update elements with equivalent mappings, also publish this table as guidance

PeterParslow commented 1 month ago

We'll also need to include guidance or at least comment on converting GEMINI to DCAT covering how many dcat distributions to create (depending on e.g. GEMINI Use constraints & Resource locators).

Revised table, with extra columns for DCAT v3 & UK government metadata exchange model. Note, the UK Gov work is supposed to consider adding spatial & some other things; they also plan to convert it to a full AP of DCAT v3.

GEMINI element Condition Schema.org DCAT/DCAT2[1] Notes DCAT3 UK Gov MXM
Title name dct:title Y Y
Dataset language inLanguage dct:language Y N
Abstract description dct:description Y Y
Topic category keywords dct:subject Y N
Keyword INSPIRE theme keywords dcat:theme / dct:subject DCAT3 expects theme to be used when the target is a SKOS concept; subject in the more general case, whether or not the term is from a controlled vocab Y dcat:theme
Keyword free text keywords dcat:keyword Schema.org puts all the ‘free text’ keywords in one value; DCAT / MXM keyword are 'uncontrolled' literals Y Y
Keyword Controlled list, URL Keywords.DefinedTerm.name
Use .description for the textual content of the Anchor or CodeList
Use .url for the target of the Anchor
dcat:keyword.DefinedTerm dcat:theme? dcat:theme
Temporal extent temporalCoverage[2] dct:temporal Y N proposed
Dataset reference date 19115 dateType = publication datePublished dct:issued release date / issued Y Y
Dataset reference date 19115 dateType = revised dateModified update date / dct:modified Y Y
Lineage dct:provenance Uses PROV N
Extent spatialCoverage.Place.name dct:spatial if available as a link Y N proposed
Resource locator.linkage 19115 function = download contentURL (inside “distribution”) dcat:downloadURL Y Y
Resource locator.linkage 19115 function = “information”
Where the page links on to download
dcat:accessURL of a dcat:Distribution? Y N
Resource locator.linkage 19115 function = “information” url dcat:landingPage Y N
Data format encodingFormat dct:format of a dcat:Distribution Possibly also dcat:mediaType Y N
Responsible organisation 19115 role = publisher publisher.Organization (with at least name, email, url) dct:publisher Y Y
Responsible organisation 19115 role = pointOfContact contactPoint (probably Organisation, with at least name, email, url) dcat:contactPoint dcat:contactPoint is a vCard Y must contain email & contactName (organisation)
Use constraints Use constraints is being used to indicate a licence license dct:license license is a property of a distribution Y Y licence
Use constraints Where GEMINI has an Anchor URL to the licence licence.CreativeWork
.abstract (with the free text) and .url (with the Anchor target URL)
Y Y
Use constraints Other circumstances dct:accessRights accessRights is a property of the dataset Y Y
Bounding box spatialCoverage.geo.GeoShape.box dct:spatial.dct:Location.dct:bbox Note: needs translating from four edges to two corners Y N
Resource identifier identifier dct:identifier Y Y
Resource type rdf:type cataloguedResource is either Dataset or DataService; Note: DCAT-AP does not distinguish between datasets and dataset series; DCATv3 does The CataloguedResource can be either Dataset, DatasetSeries, or DataService Y
archaeogeek commented 2 weeks ago

@PeterParslow what do I need to do next? I can't remember...

PeterParslow commented 2 weeks ago

@PeterParslow what do I need to do next? I can't remember...

See if what I've come up with in a desk exercise matches what you'd expect from the GeoNetwork implementation of DCAT?

nmtoken commented 2 weeks ago

@archaeogeek do you have a link to where this transformation is mapped in GeoNetwork 4. It is available (in theory) though the OGC API - Records interface, though links aren't working for us

archaeogeek commented 1 week ago

@nmtoken it's not the mapping. We have it working here: https://spatialdata.gov.scot/geonetwork/api/collections/main/items/fa510351-8e30-4147-b984-862be84a6f90. You need to check the log files- I suspect you're missing the relevant xsl files in https://github.com/geonetwork/geonetwork-microservices/tree/main/modules/services/ogc-api-records/src/main/resources/xslt/ogcapir/formats/copy (which is completely undocumented). Basically you need a gemini one that matches the iso19139 one

nmtoken commented 1 week ago

Not the headers then (https://github.com/geonetwork/geonetwork-microservices/issues/114) ?

archaeogeek commented 1 week ago

@nmtoken the above is all I had to do to get it working, YMMV.

nmtoken commented 4 hours ago

@archaeogeek Just checking we are not talking at cross purposes, you seem to be saying that in your Tree Preservation Orders - Argyll and Bute example the fact that the schema.org, dcat, dcat_turtle, and geojson tabs link to content is becuase you have a gemini XSL file and we don't.

For us (for example https://metadata.bgs.ac.uk/geonetwork/api/collections/main/items/a2b1143b-5c5d-23d6-e054-002128a47908) and the EEA geospatial data catalogue (for example https://sdi.eea.europa.eu/catalogue/api/collections/main/items/71c47f78-27b6-4080-acd5-47b306b273d8) these tabs don't give any content (only errors).