adiwg / mdTranslator

Metadata translation tool built using Ruby
https://www.adiwg.org/mdTranslator/
The Unlicense
14 stars 12 forks source link

DCAT-US writer: optional fields #269

Open hmaier-fws opened 1 year ago

hmaier-fws commented 1 year ago

Issue:

Add support for DCAT-US schema optional fields.

Related issues: #251, #264, #267, #268

For information on the DCAT-US schema see:

Optional fields

Proposed mapping of mdJSON to DCAT-US optional fields.

accrualPeriodicity

Description

The frequency with which dataset is published. See #276

conformsTo

Description

Data Standard URI used to identify a standardized specification the dataset conforms to. URI used to identify a standardized specification the dataset conforms to. See #281

dataQuality

Description

U.S. Government specific. Whether the dataset meets the agency’s Information Quality Guidelines (true/false). See #278

describedBy

Description

URL to the data dictionary for the dataset. Note that documentation other than a data dictionary can be referenced using Related Documents (references). See #284

describedByType

Description

The machine-readable file format (IANA Media Type also known as MIME Type) of the dataset’s Data Dictionary (describedBy). See #280

isPartOf

Description

The collection of which the dataset is a subset.

issued

Description

Date of formal issuance.

language

Description

The language of the dataset. See #277

landingPage

Description

This field is not intended for an agency’s homepage (e.g. www.agency.gov), but rather if a dataset has a human-friendly hub or landing page that users can be directed to for all resources tied to the dataset.

primaryITInvestmentUII

Description

U.S. Government specific. For linking a dataset with an IT Unique Investment Identifier (UII). See #279

references

Description

Related documents such as technical information about a dataset, developer documentation, etc.

systemOfRecords

Description

U.S. Government specific. If the system is designated as a system of records under the Privacy Act of 1974, provide the URL to the System of Records Notice related to this dataset.

theme

Description

Main thematic category of the dataset.

Mapping

No (not required)

Field Name DCAT Name Condition mdJson Source
Release Date dcat:issued if resourceInfo.citation.date[any].dateType = "publication" or "distributed" resourceInfo.citation.date[earliest]
Frequency dcat:accrualPeriodicity [ISO codelist MD_maintenanceFrequency can be used and several codes intersect with accrualPeriod codelist they are partially corresponding. A column of ISO8601 code equivalents could be added to MD_maintenanceFrequency to provide the coding expected https://resources.data.gov/schemas/dcat-us/v1.1/iso8601_guidance/#accrualperiodicity, community valuation should be determined]
Language dcat:language [language codelist could be used but needs to be bound with country corresponding to the RFC 5646 format https://datatracker.ietf.org/doc/html/rfc5646, such as "en-US", community valuation should be determined
Data Quality dcat:dataQuality [this is a boolean to indicate whether data "conforms" to agency standards, value seems negligble]
Category dcat:theme where resourceInfo.keyword[any].thesaurus.title = "ISO Topic Category" [resourceInfo.keyword.keyword[0, n] flatten]
Related Documents dcat:references associatedResource[all].resourceCitation.onlineResource[all].uri + additionalDocumentation[all].citation[all].onlineResource[all].uri [comma separated]
Homepage URL dcat:landingPage [Add code "landingPage" to CI_OnlineFunctionCode]
if resourceInfo.citation.onlineResource[any].function = "landingPage"
resourceInfo.citation.onlineResource.uri
Collection dcat:isPartOf for each associatedResource[0, n].initiativeType = "collection" and associatedResource.associationType = "collectiveTitle" associatedResource.resourceCitation[0].uri
System of Records dcat:systemOfRecords [Add code "sorn" to DS_InitiativeTypeCode]
for each associatedResource[0, n].initiativeType = "sorn"
associatedResource.resourceCitation[0].uri
Primary IT Investment dcat:primaryITInvestmentUII [Links data to an IT investment identifier relative to Exhibit 53 docs, community valuation should be determined]
Data Dictionary dcat:describedBy if dataDictionary.dictionaryIncludedWithResource IS NOT TRUE and citation[0].onlineResource[0].uri exists dataDictionary.citation[0].onlineResource[0].uri
Data Dictionary Type dcat:describedByType [For simplicity, leave blank implying html page, community decision needed whether to support other format types using mime type and in the form of "application/pdf"]
Data Standard dcat:conformsTo [Currently not able to identify the schema standard the data conforms to, though this has been proposed. Should this be built and there is community decision to support it, then it can be mapped]
dwalt commented 1 year ago

Issued is not writing relative to test data having a "publication" date type.

dwalt commented 1 year ago

accuralPeriodicity is not writing. Test data has a md_maintenanceFrequency code of "annual".

dwalt commented 1 year ago

References is not writing. Test data has an associated resource that should have been written.

dwalt commented 1 year ago

Theme is not writing. Test data has at least one ISO Topic Category keyword.

dwalt commented 1 year ago

Described By did not write. Test data had a data dictionary "not contained within the record", and uri to external dictionary.