gesistsa / rio

🐟 A Swiss-Army Knife for Data I/O
http://gesistsa.github.io/rio/
602 stars 76 forks source link

metadata export to e.g. JSON-LD, DDI #177

Open rubenarslan opened 6 years ago

rubenarslan commented 6 years ago

Hi. I had forgotten about rio, but will now use it (and its wonderfully consistent interface for importing files with metadata) for the webapp of https://github.com/rubenarslan/codebook I was wondering whether you had any plans for exporting to metadata-only formats. I wrote a very basic first attempt at exporting to JSON-LD, but I was also asked about DDI.

Since you write about making a FOSS replacement for Stat/Transfer and Sledgehammer, I was wondering whether you consider exporting to metadata only formats within scope. I'd love to be able to embed metadata in various formats in my codebooks, but I don't think I'll tackle writing to DDI in R on my own. State of my own research: ipumsr is a package that reads DDI currently, and r2ddi is the only attempt at writing DDI that I know (and the developer has abandoned it).

leeper commented 6 years ago

I've wanted to support DDI (briefly mentioned it here: https://github.com/leeper/rio/issues/12) but don't have the ambition to write a full DDI package. It's unfortunately too complex to bootstrap and there don't seem to be any existing tools that we could draw on directly: https://www.ddialliance.org/resources/tools

My preference would be for a separate DDI package that handles import/export that we could then use here rather than incorporating it directly into rio.

PS - fantastic re: codebook!

rubenarslan commented 6 years ago

I think the DDI spec is good at crushing ambitions 😄 I asked @gergness (ipumsr imports DDI) what IPUMS uses, he'll ask.

gergness commented 6 years ago

Sadly, I don't think we'll be able to make our tools for DDI writing available any time soon. If helpful, ipumsr has code to read the DDIs generated by our site. However, the spec is huge and I only implement a minimal set of features that allowed me to read our extracts, so I'm not sure it will be.

rubenarslan commented 6 years ago

@gergness In your honest opinion, do you think DDI has the potential to spread to the FOSS world? The FOSS ecosystem seems so limited, and it was so much easier to get going with JSON-LD...

gergness commented 6 years ago

Ha, I can only think of this: https://xkcd.com/927/

I don't really think there's anything that special about DDI, if you can get data import/export round trip for JSON-LD, faster than with DDI, I'd go with that.

rubenarslan commented 6 years ago

@gergness I guess the killer app for research dataset metadata is good search. I don't know anything that uses DDI for search across platforms, do you know if anything is in the works/did I miss sth? Because JSON-LD for datasets search is also just a promise without a timeline I suppose.

gergness commented 6 years ago

Nope, I'm not aware of anything either.

leeper commented 6 years ago

I think the main issue with DDI is there's not a low-level library that implements it fully so anyone who wants to use it has to start from scratch (see, for example, the Dataverse implementation: https://github.com/IQSS/dataverse/blob/3c7d647cbb2b5cf33b9c40276c4a69de73da16a5/src/main/java/edu/harvard/iq/dataverse/export/ddi/DdiExportUtil.java). That ultimately undermines its utility as it's too complex to really justify investing time in for a niche project.

rubenarslan commented 5 years ago

See https://github.com/dusadrian/DDIwR/ btw.