DOI-DO / dcat-us

Data Catalog Vocabulary (DCAT) - United States Profile Chief Data Officers Council & Federal Committee on Statistical Methodology
Other
58 stars 6 forks source link

Theme field/topics #9

Closed hkdctol closed 8 months ago

hkdctol commented 1 year ago

Hyon Kim Data.gov GSA

Requirement(s)

If a catalog is to provide faceted views or a list of pertinent datasets on certain topics, the schema needs an agreed to list of topics or themes

Problem Statement

Current DCAT US 1.1 https://resources.data.gov/resources/dcat-us/#theme accepted value is array of strings, and states it could refer to ISO topics, but there is not an agreed-to list of topics for the theme field We need some input on potential existing lists that could be used, and consider adopting one.

Target Audience / Stakeholders

Agencies, public users

Additional context, comments, or links - Optional

Data.gov had several topics pages with manually curated list of datasets created by topic leads, which was not sustainable. Most topic pages have been archived. User testing that Data.gov conducted in 2022 indicated that public users do expect/benefit from pre-defined collection of datasets on certain topics.

philipashlock commented 1 year ago

There's a list of some Federal classification/taxonomy schemes listed and presented in spreadsheet form over here - https://github.com/GSA/governmentwide-classifications

TDabolt commented 9 months ago

P1 - we need to discuss - I agree we need a controlled vocab for topics - the question becomes - which one. I lean towards LOC related ones as they will be maintained are used across US and across disciplines. See issue submitted by @doi-jschlagel

fellahst commented 9 months ago

To meet this requirement for providing faceted views or lists of datasets based on specific topics or themes, the following solution is recommended. This approach aligns with the evolving standards of DCAT and allows for a flexible yet structured classification of datasets.

Recommended Solution for theme Property:

Utilize Semantic Classifications:

The theme property should not be handled as a mere array of strings. Instead, it should refer to semantic classifications defined in a controlled vocabulary. This ensures that themes are not arbitrarily assigned but are based on a structured and recognized system of categorization.

Alignment with DCAT3 Standards:

In DCAT3, the theme property is a specialization of dcterms:subject, wherein themes categorizing the resources are organized in structures like skos:ConceptScheme, skos:Collection, owl:Ontology`, or similar. This structure describes all categories and their relations, as outlined in the DCAT3 Vocabulary.

Multiple Vocabularies for Classification:

It is not necessary to adhere to a single, universally applicable controlled vocabulary for themes. Instead, multiple vocabularies can be used for the classification of resources, allowing for greater flexibility and relevance across diverse datasets. The controlled vocabulary for theme and other classifier properties should not be limited to the Catalog of Datasets. Each Dataset within the Catalog can utilize one or more different controlled vocabularies for theme, enabling more specific and accurate classification.

More information can be found in resource classification section in DCAT US 3.0 usage guideline.

TDabolt commented 9 months ago

May want to consider a short discussion of how this can be accomplished in the implementation section.