ResearchObject / ro-crate

Research Object Crate
https://w3id.org/ro/crate/
Apache License 2.0
82 stars 34 forks source link

Use Case: In-situ/on-the-fly manifests alongside 'payload' data #14

Open eocarragain opened 5 years ago

eocarragain commented 5 years ago

As a developer/researcher/data-steward, I want to be able to capture the manifest in the root of my git-repo/data-folder as I work on my code/data, so that my existing processes & folders aren't affected while I iteratively enhance the metadata and payload.

See, for example, CodeMeta, Frictionless Data, and the way in which DataCrate distinguishes Working DataCrates from Bagged DataCrates

This contrasts to the approach where some sort of wrapper folder/container is put around 'payload', e.g. BagIt.

ptsefton commented 5 years ago

Agreed - but we might also want to make it possible to put the manifest OUTSIDE of the data tree altogether in some situations. The simplest approach is to go one level up, and for example put the data in a /data directory, but you might want to have manifests in sibling directories or elsewhere on the file system, eg a tool might store manifests in a an application directory until the data is ready to be packaged.

eocarragain commented 5 years ago

Just to reinforce this use case, here is the first Guiding Principle of the Psych-DS project:

First, Psych-DS is a technical specification designed for datasets that may be generated and handled by a single researcher. Tools that work with this specification may provide significant added functionality, but should not lose sight of the fact that Psych-DS’s audience is the individual researcher. For very large projects (in psychology and other disciplines), it’s often necessary and appropriate to consult with libraries and archiving organizations to prepare a dataset for archiving after the fact, but as researchers increasingly share their own datasets, they need to be able to manage the dataset preparation on their own, given appropriate software tools. One very helpful way to achieve this aim is to plan for sharing/archiving from the beginning of a project; Psych-DS is designed to be a format that can be used at any point in the research workflow.