Creating a public dataset

adunning commented 4 years ago

We had a request for a CSV version of our manuscripts list, and it strikes me that we could make a useful public dataset out of this. Any suggestions to improve on the following (thinking of what can export reliably and is fairly normalized)?

Shelfmark
SC number
msName
date (3–4 columns? for text, notBefore, notAfter, and any precise date given)
place of origin (ID only? ID+text?)
head or summary contents if unavailable (as generated online)
languages (as ISO codes?)
IDs + text for works in msItem
IDs + names for former owners
support material
dimensions
second folio
types from decoNote

@andrew-morrison has suggested creating a separate line for each part in the case of composite manuscripts.

andrew-morrison commented 4 years ago

A lot of the same fields (but not all) were mapped to CIDOC-CRM and FRBRoo ontologies and exported as RDF for the Mapping Manuscript Migrations project. It would probably be possible to formulate a SPARQL query to run against their triple store, although the data is a snapshot from last year. I believe the idea is we'll eventually publish linked data of our own, updated regularly, which would cater for the more advanced researchers. But a simpler CSV export could also be useful for those who just want something they can copy and paste into a spreadsheet.

tobyburrows commented 4 years ago

The MMM project's RDF version of the Bodleian data can be downloaded from the Zenodo repository here: (https://zenodo.org/record/3667486) There's a separate Turtle file for the Bodleian data.

bodleian / medieval-mss

Creating a public dataset #309