DOI-DO / dcat-us

Data Catalog Vocabulary (DCAT) - United States Profile Chief Data Officers Council & Federal Committee on Statistical Methodology
Other
58 stars 6 forks source link

Keep the metadata content relatively high-level and Base changes to the standard more on feedback from users than on feedback from agencies. #113

Closed sofianef closed 8 months ago

sofianef commented 1 year ago

Creator Name: Dave Rugg Creator Contact Affiliation: Program Manager, Research Data Services; david.rugg@usda.gov Creator Affiliation: U.S. Forest Service, Research & Development

Requirement(s)

My requirements for the DCAT-US profile are reasonably simple and straightforward:

  1. Keep the metadata content relatively high-level – and therefore suitable for a general-purpose catalog like Data.gov.
  2. Base changes to the standard more on feedback from users than on feedback from agencies.

Problem Statement

A surprisingly large number of the requirement recommendations received to date are primarily for the benefit of the agency, not the data user/seeker. These should be ignored. Including these requirements would make the metadata much more complex without adding substantive value for the user looking for datasets of interest.

Examples of unhelpful proposed requirements: ORCID iD, references to scientific articles, lifecycle state, lots of elements from the FGDC Content Standard for Digital Geospatial Metadata.

Should most of the proposed metadata be available from the producer? Of course! And the user who has discovered a potentially interesting dataset can further evaluate that dataset for relevance using the metadata at the producer’s site. A library’s catalog of books sensibly limits the amount of information about each book in library. Once an interesting book is identified by browsing the catalog, the user can go inspect it and learn more about the book before checking it out. The catalog is not the be-all and end-all of choosing a dataset.

Target Audience / Stakeholders

This advice helps metadata experts in the producer agencies – a simple, straightforward DCAT standard is easier to map specialized metadata standards to than is a DCAT standard adorned with piles of baroque ornamentation understood by few Data.gov users.

This advice is also intended to help the bulk of data consumers find what they’re looking for without wading through pages of metadata content primarily relevant to the data producer and other highly specialized communities.

Intended Uses / Use Cases

As an example of feedback from users, consider bureau and program codes. I often use them on the relatively infrequent occasions I do Data.gov searches. I am expecting that, based on your web analytics, you know whether these codes are used often enough in faceted searches to warrant retaining them. If they are useful to users, and OMB doesn’t want to do the necessary maintenance, perhaps there is something in the Treasury codes that could be used instead.

Additional context, comments, or links - Optional

As context for my comments, I will share that I manage a scientific data repository for the Forest Service. We have carefully reviewed rich metadata compliant with the FGDC’s Content Standard for Digital Geospatial Metadata – Biological Data Profile standard. All the optional information in that standard is always included in our metadata. We have ORCID iD when available, links to the associated scientific articles (and related datasets); we have geospatial and non-geospatial data; we have time series and non-time series data. (Nothing particularly special about time series, contrary to the one posted suggestion.) We also have a catalog search comparable to Data.gov. Each research data publication has a landing page with basic discovery metadata, and that page links to the full metadata and a descriptive listing of all the files in the data pub. Our customers review the full metadata and the publication manifest more frequently than they download the pub. Despite having this rich metadata already created, and despite knowing that our customers do their due diligence prior to choosing to acquire a copy of the pub (or query one of our online databases), I do not believe it would be helpful to the Data.gov user to provide all these metadata in the Data.gov catalog. If one of our data pubs looks interesting based on a Data.gov search, it is not difficult to use the permanent link (a digital object identifier, DOI) to get to the in-depth metadata and pub manifest.

Like Data.gov, not all datasets described in our catalog reside in our repository. For those that scientists chose to publish elsewhere, we have a standard publication page (not unlike the standard Data.gov dataset page). If the dataset looks interesting, a customer can click on the link to the repository where the dataset resides. It’s not hard from the user’s perspective, and we’ve never received a complaint about this structure in over 10 years of operation.

To buttress the case for not over-doing the generic DCAT-US metadata for the special case of scientific data, I will note that while we have a fair number of referrals from Data.gov, we have twice as many from science.gov, 10X more from USDA’s Ag Data Commons (the USDA-wide scientific data catalog) and around 30X more from Google Scholar (U.S. and multiple international versions).

Original Email Submission: DCAT-US-3-Requirements.generality.docx

fellahst commented 8 months ago

Thank you for your valuable insights and suggestions regarding the DCAT-US profile. Your emphasis on keeping the metadata content high-level and user-centric aligns well with the overarching goal of making Data.gov a general-purpose, user-friendly catalog.

Your feedback underscores a key principle in data cataloging – the balance between comprehensiveness and usability. We acknowledge your concern that certain detailed metadata elements, while beneficial for agency-specific purposes, may not add significant value to the general user seeking datasets on Data.gov.

We agree that the primary objective of Data.gov should be to facilitate easy discovery and access to datasets for a broad audience. The comparison to a library's catalog, which provides just enough information for users to identify potentially interesting books, is particularly apt. Data.gov should similarly enable users to quickly identify datasets of interest, with the option to delve deeper into detailed metadata at the producer’s site if needed.

Regarding your specific mention of bureau and program codes, we will take your suggestion into consideration

Your experience managing a scientific data repository for the Forest Service and your insights into the user interactions with metadata are particularly informative. It's clear that while rich, detailed metadata is invaluable for specific communities and purposes, a streamlined, general-purpose catalog like Data.gov benefits from a more curated approach to metadata content.

We appreciate the perspective you have provided, emphasizing the importance of a user-focused approach in enhancing the DCAT-US metadata standard. Your input will be a valuable part of our ongoing efforts to refine and improve the DCAT-US profile, ensuring it remains effective and user-friendly.

TDabolt commented 8 months ago

Agree with the draft. For future reference - please refrain from making any comments that could be interpreted as how the future governance / maintenance will be accomplished. "Regarding your specific mention of bureau and program codes, their utility and usage frequency will be assessed. If these elements are indeed frequently used in searches and aid in data discovery, we will explore ways to maintain their relevance and accuracy, possibly considering alternatives like Treasury codes, as suggested." - rather state - we will take that suggestion under consideration.