DOI-DO / dcat-us

Data Catalog Vocabulary (DCAT) - United States Profile Chief Data Officers Council & Federal Committee on Statistical Methodology
Other
58 stars 6 forks source link

Change of Requirement Level to Mandatory for Multiple Elements #191

Closed ShaferAC closed 8 months ago

ShaferAC commented 9 months ago

Creator Name: Allison Shafer Creator Affiliation: U.S. Census Bureau

Requirement(s)

Summarize the requirement(s) and related aspects that the DCAT-US profile should fulfill, optionally including priorities for each requirement. This can be written in a few sentences or with bullets.

Increase the requirement level for the following elements to Mandatory for at least Dataset application profile to increase findability and provide users with necessary pertinent information:

Problem Statement

Mandatory statement of the current situation, including: dct:keyword is currently listed as a Recommended element and dct:issued and dct:modified are only listed as Optional. These elements are highly beneficial to users to findability and usability purposes. The increase in level of requirement would ensure standardization that this information is present/provided for Datasets by all users.

Target Audience / Stakeholders

Establish who the intended audience/stakeholders of the requirement are, and who will be impacted by this requirement. When the describing the intended audience or stakeholder, please be as specific as possible (avoid using the term: user).

User 1. Providers of DataService Catalogs User 2. Consumers of DataServices/APIs User 3. Web developers, when developing apps on a DCAT-oriented backend User 4. Data Consumers/Producers in the Statistical Domain

Intended Uses / Use Cases

The intended uses expected for the requirement, usually in the form of an imperative sentence starting with a verb each describing an individual task in order to solve the stated problem.

Provide pertinent information about the vintage of the dataset. Enhance and expand values that can be used to locate data.

TDabolt commented 9 months ago

P1 - Concur w/ recommendation.

hkdctol commented 9 months ago

+1

fellahst commented 8 months ago

The use of dct:modified is recommended to indicate any modifications to the dataset. However, it's understood that this property can be left null if the dataset has never been modified. Additionally, incorporating dcat:keyword and the dct:issued property can significantly enhance the dataset's discoverability. It's important to balance the inclusion of mandatory properties, as each additional requirement can increase the burden on implementers to populate these fields. Thoughtful consideration of which properties to mandate ensures both comprehensive data representation and ease of implementation. I think it is worth having a discussion on this topic at our next meeting

mrratcliffe commented 8 months ago

I don't think it's too much burden to add a few keywords. I support making it mandatory. The challenge to folks will be in choosing the best keyword(s) to describe the dataset and enhance findability. Too general, and the dataset turns up and the person doing the search wonders why. Too specific and the dataset doesn't show up in search as frequently as it perhaps should. Anyone who's ever added keywords to the abstract for a conference presentation, or searched conference programs for presentations to attend, knows how challenging it can be.

fellahst commented 8 months ago

One possible scenario is to let the registry populate keywords (autotags) by extracting them from the description and title of the Dataset using Natural Language Processing (NLP). If the keywords are available to the implementers it is recommended to be added, if not available autotagging can be used to improve search.

mrratcliffe commented 8 months ago

I like the autotagging scenario. Mitgates the challenge and probably will produce more consistent results.

TDabolt commented 8 months ago

Flag for implementation discussion of pro and con of the recommended approach.