UtrechtUniversity / yoda

A system for reliable, long-term storing and archiving large amounts of research data during all stages of a study.
https://utrechtuniversity.github.io/yoda/
GNU General Public License v3.0
42 stars 26 forks source link

[FEATURE] Metadata of data objects also copied when data package is secured in vault #19

Open acnewton opened 3 years ago

acnewton commented 3 years ago

Issue

When a data package is submitted to the vault. The metadata of the collection is copied over, but the metadata of individual data objects within the data package are not copied along side with the data objects.

Solution

When invoking the copy of a data object, also copy the metadata.

Question

Are there downsides to copying the metadata of individual data objects?

lwesterhof commented 3 years ago

To gain a better understanding of your request we were wondering what kind of metadata you store on the data objects?

We try to follow the OAIS model (ISO 14721:2012). For example, we store the relevant collection metadata in the JSON metadata file to ensure no relevant metadata is lost when someone downloads the data package for use outside of Yoda.

acnewton commented 3 years ago

I don't necessarily have specific metadata for data objects or collections in mind. I would simply expect any metadata that a user has added, outside of the metadata form currently installed, to a data object or to collections, would also be present if the only copy of the data object is in the Vault. This metadata would not be captured in the metadata file yoda-metadata.xml which is stored currently.

But to give an example for a metadata model, I could imagine that a user wants to make use of RO-Crate metadata (https://w3id.org/ro/crate) which would be stored in a ro-crate-metadata.json file on the same level as yoda-metadata.xml. In this ro-crate metadata file you can define metadata per data object or subdirectory which could also be added as iRODS AVUs (not necessarily automatically which is done for yoda-metadata.xml). Upon securing in the Vault, although the file ro-crate-metadata.json would be copied over, the iRODS metadata would be 'lost' in the vault. The data objects would then not be findable by querying for ro-crate metadata.

tsmeele commented 3 years ago

If ro-crate metadata needs to be findable, indeed it needs to be part of the archive package envelop which right now consists of the yoda-metadata.json/xml file(s) and the license.txt file.
Currently, as per Yoda 1.6, the metadata is linked to a ruling (JSON)Schema. The schema ensures that metadata can be validated, meaningful and machine actionable. We investigate options to allow for more flexible metadata based on jsonld, which might also cover the use case you mention. Standards are emerging to accommodate this, e.g. shacl.

lwesterhof commented 5 months ago

https://github.com/UtrechtUniversity/yoda/discussions/347#discussioncomment-8379791:

My use-case scenario is simple. I have custom iRODS rules that, upon file upload, automatically extract the file metadata (if any) to store them into iRODS. Think of image files with metadata in Exif format or a simple jupyter notebook with arbitrary metadata as specified in the Jupyter Notebook Format, etc... Researchers need to access and search such metadata so it would be great if they were preserved. A possible solution would be to include them both in json files and the iCAT, but I have no special requests on this as long as the metadata are preserved in iRODS. Thanks.