adiwg / mdTranslator

Metadata translation tool built using Ruby
https://www.adiwg.org/mdTranslator/
The Unlicense
14 stars 12 forks source link

Support DCAT-US metadata schema #251

Open jwaspin opened 1 year ago

jwaspin commented 1 year ago

Add support for the DCAT-US Schema v1.1 (Project Open Data Metadata Schema).

This is a general placeholder issue for discussion of requirements related to the implementation of a DCAT-US writer. The DCAT-US schema is used to ingest metadata records into the Data.gov catalog.

Related issues: #251, #264, #267, #268, #269, #275, #276, #277, #278, #279, #280, #281

For information on the DCAT-US schema see:

Features to implement:

mdJSON to DCAT-US writer

DCAT-US to mdJSON reader

This probably requires further discussion. Not sure of an immediate need for this.

References

For general information and potential mappings from mdJSON see:

jwaspin commented 1 year ago

Should the format of the properties include the prefix "dcat:" ?

{
  "@context": {
    "dcat": "http://www.w3.org/ns/dcat#",
    "dct": "http://purl.org/dc/terms/"
  },
  "@type": "dcat:Dataset",
  "title": "Example Dataset",
  "keyword": ["example", "dataset"]
}

vs

{
  "@context": {
    "dcat": "http://www.w3.org/ns/dcat#",
    "dct": "http://purl.org/dc/terms/"
  },
  "@type": "dcat:Dataset",
  "dcat:title": "Example Dataset",
  "dcat:keyword": ["example", "dataset"]
}

Noting the difference between "title" and "keyword"

hmaier-fws commented 1 year ago

It seems that in this case both would be the same. My understanding is that JSON-LD uses the namespace to determine context. So you could potentially have something like:

{
  "@context": {
    "dcat": "http://www.w3.org/ns/dcat#",
    "dct": "http://purl.org/dc/terms/"
  },
  "@type": "dcat:Dataset",
  "dcat:title": "Example Dataset",
  "dct:title": "A title with different semantics"
}

which would indicate that the two titles have different definitions (specified by the respective namespaces). But I'm not sure that matters if we just have one title for DCAT-US.

@chris-macdermaid do you know if the DCAT-US ingest cares if it sees plain JSON or if it supports the import of JSON-LD? If not, is there an advantage of using one form over the other? I'm thinking if we ever want to develop some other generic JSON-LD readers/writers. (I suppose I could dig into the standard, but thought you might already know the answer)

jwaspin commented 1 year ago

For now, I am omitting the @context and all fields are prefixed with dcat:

jwaspin commented 1 year ago

@dwalt or @hmaier-fws Does the publisher need to be an organization?

This is what was specified: if citation.responsibleParty [any] role="publisher" then contactId -> contact.name, else if resourceDistribution.distributor.contact NOT NULL then [first contact] contactId -> contact.name

dwalt commented 1 year ago

@jwaspin @hmaier-fws Good catch. It looks like DCAT is expecting an organization as expressed through the @type JSON-LD data type. We could throw in a trap for isOrganization in the conditionals.

chris-macdermaid commented 1 year ago

@chris-macdermaid do you know if the DCAT-US ingest cares if it sees plain JSON or if it supports the import of JSON-LD? If not, is there an advantage of using one form over the other? I'm thinking if we ever want to develop some other generic JSON-LD readers/writers. (I suppose I could dig into the standard, but thought you might already know the answer)

Data.gov does support plain JSON. If the Data.gov validator validates the record it should be able to be used by Data.gov. https://catalog.data.gov/dcat-us/validator

dwalt commented 1 year ago

@chris-macdermaid It appears it supports JSON-LD as you can reference a JSON-LD schema: https://resources.data.gov/resources/dcat-us/#context

chris-macdermaid commented 1 year ago

@dwalt you're correct DCAT-US supports DCAT using the JSON-LD format.

The current schemas are located here.

Data.gov supports both federal DCAT and non-ferderal DCAT (bureau and program codes not required). They also support DCAT at the catalog and dataset levels.

Draft updated versions of the schemas https://github.com/GSA/datagov-harvesting-logic/tree/main/data/dcatus/schemas