ResearchObject / ro-crate

Research Object Crate
https://w3id.org/ro/crate/
Apache License 2.0
83 stars 34 forks source link

Use Case: Archive MIAME compliant RNA-sequencing data #19

Open frederikcoppens opened 5 years ago

frederikcoppens commented 5 years ago

As a data steward responsible for managing and preserving of data generated in my lab, I want ensure raw data generated by a sequencing experiment (gzipped FastQ files, typically few GB per sample) is stored securely in my university RDM system, with all the relevant metadata so that I can submit the data to ArrayExpress (the designated ELIXIR repository) with the click of a button.

My archiving solution (https://viaa.be/en) default expects BagIt containers with a BagInfo.txt file with DataCite metadata. I want to add additional metadata (MIAME standard used by ArrayExpress).

eocarragain commented 5 years ago

hi @frederikcoppens Do you have an example (even a dummy example) of what a) the bagit format expected by the archiving solution (e.g. where does it expect the datacite metadata to be and in what format); b) what that would look like with the MIAME metadata added (e.g. is this a MAGE-TAB file, and where does it sit relative to the rest of the "crate"?)

frederikcoppens commented 5 years ago

hi @eocarragain

below the file structure and contents of files in a dummy example I got. The bagit container gets a UUID as filename. We can't have a deeper folder hierarchy in the /data (recommended to put a tar there, but can be multiple files too). We need a bag-info.txt file with the metadata.

The archiving solution (or library) has not yet implemented additional metadata beyond the bag-info.txt file. This is open for discussion. I like the idea of having additional files next to the 'master metadata file' which are there then referred to.

For MIAME: currently this would be in MAGE-TAB indeed, as this makes most sense as it is what ArrayExpress expects/generates. Towards the future this will likely become JSON.

example of file structure in the container:

bag-info.txt bagit.txt manifest-md5.txt data: PROJECT_XYZ_2018_0001_LS.txt PROJECT_XYZ_2018_0001_MA.tar

bagit.txt file

BagIt-Version: 0.97 Tag-File-Character-Encoding: UTF-8

bag-info.txt

Payload-Oxum: 412302.3 Bagging-Date: 2019-02-14 Bag-Size: 402.7 KB DC-Title: DC-Description: DC-Publisher: DC-Relation: DC-Subject: <keyword(s): a; b; c; d;> DC-Date: DC-Identifier: DC-Identifier: DC-Identifier: DC-Type: <Collection|Dataset|Event|Image|InteractiveResource|MovingImage|Physical_Object|Service|Software|Sound|Still_image|Text> DC-AccessRights: <closed|open|ugent> DC-Available: open|ugent>

manifest-md5.txt

contains md5 sum for each file in /data