Closed sofianef closed 8 months ago
To address the issue raised by Carrie regarding the documentation of multiple formats or media types in single-file distributions (like ZIP or TAR files) within the DCAT specification, we can consider four specific approaches:
Implementation Steps:
Define dcat:componentFormats
Property: Propose an extension to the dcat:Distribution
class by introducing a new property named dcat:componentFormats
. This property would hold a list or an array of media types or formats contained within the package.
Schema Update: Modify the DCAT schema to include this new property, defining its type, range, and usage guidelines.
Documentation and Examples: Update the DCAT specification with clear documentation on how to use dcat:componentFormats
, providing examples that illustrate the property in action.
Implementation Steps:
Introduce BagIt Profile Reference: Develop a standard way to include a reference to a BagIt Profile in DCAT metadata. This could be a new property, e.g., dcat:bagItProfile
, which would link to the BagIt Profile.
Profile Specification and Integration: Ensure that the BagIt Profile used is comprehensive enough to specify the behavior of BagIt implementations for the relevant data sets.
Update Specification with BagIt Integration: Include guidelines and examples in the DCAT spec that demonstrate how to integrate and reference a BagIt Profile within the distribution metadata.
Implementation Steps:
Guidelines for BagIt Metadata Usage: Provide detailed guidelines on how to use BagIt's bag-info.txt
for specifying different formats or media types within a bag.
Custom Tag Incorporation: Encourage the use of custom tags in bag-info.txt
, such as Payload-Oxum
, to document the various formats contained within a package.
Incorporate BagIt Metadata in DCAT: Offer methods to reference or include this BagIt metadata within the DCAT distribution descriptions, ensuring that the presence of multiple formats within a single file is clearly documented.
The reasons for deferring or considering the approach of advanced content negotiation as out of scope for the current DCAT specification are as follows:
Technical Complexity: Implementing advanced content negotiation mechanisms is a technically complex endeavor. It may surpass the current scope of the DCAT specification, which aims to maintain a balance between functionality and simplicity for broad applicability.
Need for Wider Adoption and Standards: This approach hinges on the widespread adoption of specific technical standards and protocols. Achieving such uniformity across diverse platforms and tools may not be feasible in the short term, making this approach less practical for immediate inclusion in the DCAT specification.
Resource Intensiveness: The development and ongoing maintenance of advanced content negotiation systems require substantial resources. This level of investment might be beyond the scope of what stakeholders can reasonably be expected to commit to, especially considering the varied scale and capabilities of organizations that utilize the DCAT specification.
Awaiting Community Consensus: Proposals with significant implications for implementation and maintenance, like advanced content negotiation, require a broad consensus within the community. Such consensus ensures that the changes are viable, beneficial, and sustainable for the majority of stakeholders. This process of reaching consensus may take time, suggesting a deferral of this approach until there is clearer agreement and readiness within the community.
My personal take would be Approach 4.
I concur with @fellahst - while I appreciate the issue, we should flag it more for NARA and further discussion amongst other federal stakeholders.
Name: Carrie Comstock
Affiliation: USEPA Office of Research and Development (ORD), Cummings.Carrie@EPA.gov
Type of issue: Schema, Editorial
Section # / Table #: 5.18.2 Distribution's format property
Issue:
Substantive Comment: There doesn't seem to be a way to indicate multiple underlying formats or media types in cases where files of different formats or media types are packaged in a single file (such as a ZIP or TAR file).
Rationale: Some distributions will be packages that contain files in different formats or media types, and it would be useful to be able to document all of them.
Proposed Disposition: Explanation of how to document multiple underlying formats or media types that are packaged within a single file.
Original email submission: USEPA-ScienceHub-FAIRness.Project.-.DCAT-US-3-AP.-.Comment.Review.Matrix.xlsx