DOI-DO / dcat-us

Data Catalog Vocabulary (DCAT) - United States Profile Chief Data Officers Council & Federal Committee on Statistical Methodology
Other
58 stars 6 forks source link

Explanation of how to document multiple underlying formats or media types that are packaged within a single file. #170

Closed sofianef closed 8 months ago

sofianef commented 10 months ago

Name: Carrie Comstock

Affiliation: USEPA Office of Research and Development (ORD), Cummings.Carrie@EPA.gov

Type of issue: Schema, Editorial

Section # / Table #: 5.18.2 Distribution's format property

Issue:

Original email submission: USEPA-ScienceHub-FAIRness.Project.-.DCAT-US-3-AP.-.Comment.Review.Matrix.xlsx

fellahst commented 9 months ago

To address the issue raised by Carrie regarding the documentation of multiple formats or media types in single-file distributions (like ZIP or TAR files) within the DCAT specification, we can consider four specific approaches:

Approach 1: Extend the Distribution's Format Property

Implementation Steps:

  1. Define dcat:componentFormats Property: Propose an extension to the dcat:Distribution class by introducing a new property named dcat:componentFormats. This property would hold a list or an array of media types or formats contained within the package.

  2. Schema Update: Modify the DCAT schema to include this new property, defining its type, range, and usage guidelines.

  3. Documentation and Examples: Update the DCAT specification with clear documentation on how to use dcat:componentFormats, providing examples that illustrate the property in action.

Approach 2: Define the BagIt Profile in DCAT

Implementation Steps:

  1. Introduce BagIt Profile Reference: Develop a standard way to include a reference to a BagIt Profile in DCAT metadata. This could be a new property, e.g., dcat:bagItProfile, which would link to the BagIt Profile.

  2. Profile Specification and Integration: Ensure that the BagIt Profile used is comprehensive enough to specify the behavior of BagIt implementations for the relevant data sets.

  3. Update Specification with BagIt Integration: Include guidelines and examples in the DCAT spec that demonstrate how to integrate and reference a BagIt Profile within the distribution metadata.

Approach 3: Use of BagIt Metadata for Format Specification

Implementation Steps:

  1. Guidelines for BagIt Metadata Usage: Provide detailed guidelines on how to use BagIt's bag-info.txt for specifying different formats or media types within a bag.

  2. Custom Tag Incorporation: Encourage the use of custom tags in bag-info.txt, such as Payload-Oxum, to document the various formats contained within a package.

  3. Incorporate BagIt Metadata in DCAT: Offer methods to reference or include this BagIt metadata within the DCAT distribution descriptions, ensuring that the presence of multiple formats within a single file is clearly documented.

Approach 4: Out of Scope or Deferred

The reasons for deferring or considering the approach of advanced content negotiation as out of scope for the current DCAT specification are as follows:

  1. Technical Complexity: Implementing advanced content negotiation mechanisms is a technically complex endeavor. It may surpass the current scope of the DCAT specification, which aims to maintain a balance between functionality and simplicity for broad applicability.

  2. Need for Wider Adoption and Standards: This approach hinges on the widespread adoption of specific technical standards and protocols. Achieving such uniformity across diverse platforms and tools may not be feasible in the short term, making this approach less practical for immediate inclusion in the DCAT specification.

  3. Resource Intensiveness: The development and ongoing maintenance of advanced content negotiation systems require substantial resources. This level of investment might be beyond the scope of what stakeholders can reasonably be expected to commit to, especially considering the varied scale and capabilities of organizations that utilize the DCAT specification.

  4. Awaiting Community Consensus: Proposals with significant implications for implementation and maintenance, like advanced content negotiation, require a broad consensus within the community. Such consensus ensures that the changes are viable, beneficial, and sustainable for the majority of stakeholders. This process of reaching consensus may take time, suggesting a deferral of this approach until there is clearer agreement and readiness within the community.

My personal take would be Approach 4.

TDabolt commented 9 months ago

I concur with @fellahst - while I appreciate the issue, we should flag it more for NARA and further discussion amongst other federal stakeholders.