UTS-eResearch / datacrate

Bagit-based data packaging specification for dissemination of research data with useful human and machine readable metadata: "Make Data Crate Again!"
38 stars 10 forks source link

DataCrate adoption in a production system #17

Closed andrewjanke closed 5 years ago

andrewjanke commented 6 years ago

I'm considering using datacrate for all long term archives in the UQRDM system. For one, it's based upon BagIt so there is a guarantee of things lasting for a while. However there are a few words in the spec that cause some concern.

Can you provide guidance regarding the level of maturity of the format and are any major changes planned into the future? I know it's been in use (0.1) at UTS for a while already.

Any word on backwards compatibility? tools to migrate old versions? is this even needed?

Lines in the sand being drawn regarding things that will/won't change?

ptsefton commented 6 years ago

Andrew, the format is still a bit immature and only used at UTS in a limited way so far. We're launching our new end to end research data management system in Q3, so for us it would make sense to do a stable release at that point. Given the ease of processing of DataCrate JSON-LD it would be reasonable to commit to any further iterations of the spec being accompanied by upgrade scripts to update existing packages (drop in new context if needed, make any structural changes to the BagIt layout). Would love to take a look at your data and collaborate on any new keys or other extensions you might need.

marcpbailey commented 6 years ago

Hi Andrew. Cloudstor Collections implements the original incarnation for bagit format as originally proposed in PeterS and PeterB’s work on Cr8it. That is the implementations at Newcastle, Western and now in AARNet are all compatible. Collections uses open moustache and XSLT templates to define its metadata formats and is extremely flexible. There are some new ideas in this latest DataCrate spec that - at first glance- we think are sound and readily implementable. We are keen to explore options to adopt the latest ideas into Collections in the future and very hopeful that we may entice UTS to finally join the fold. UQ is equally welcome to participate (ie coinvest with the AARNet, Intersect, WSU and LaTrobe) in shaping Collections moving forward, and, as another thought, the rdsOS initiative could also adopt this format in relation to your National Data Library proposal. Disclaimer: we are currently analysing the latest spec so this is not yet an engineering statement or commercial commitment.

andrewjanke commented 6 years ago

Thanks Marc,

This clarifies a number of things. Good to see that others are also closely following along, even if it only means the ideals of DataCrate. We digress also but would be keen to see something like DataCrate adopted in the National Data Library. I say like as I haven't seen any other well described archiving proposals akin to DataCrate that are aiming to solve many of the issues we'd like solved in both UQRDM and the National Data Library.

ptsefton commented 5 years ago

We're using this at UTS now. Closing the issue.