culturesofknowledge / emplaces

Early Modern Places
MIT License
7 stars 0 forks source link

Discussion: Structure and file formats for bulk import/export #20

Open kintopp opened 6 years ago

kintopp commented 6 years ago

Placeholder for discussion on how to structure Excel spreadsheet used by data contributors providing bulk data to EM Places.

gklyne commented 6 years ago

If using spreadsheet import route, maybe consider Rightfield?

http://www.rightfield.org.uk

Last time I looked it was quite limited, but still maybe of some use to ease the import process.

If something more complex is required, there's a tool I wrote a whole ago that might help:

https://github.com/wf4ever/ro-manager/tree/develop/src/checklist

It would need some hacking, but is essentially a generic design capable of quite sophisticated conversions from CSV to RDF.

kintopp commented 6 years ago

We have to think carefully about how to structure this – what kind of extended (i.e. non-core) data do we anticipate getting in bulk? Will the work required to conform data to our spreadsheet outweigh the benefit of uploading it in bulk? Should we start with a means to upload contributions to create core data + limited set of extended metadata only? In other words, just enough to allow a related gazetteer to import its records into EM Places and link back to it. As a point of comparison, see http://whgazetteer.org/2018/06/06/contributing-to-world-historical-gazetteer-a-preview/

kintopp commented 6 years ago

Discussed on 5 July with Marnix the possibility of working backwards towards this from the Export format. i.e. Use the complete RDF sample to generate Export formats from Timbuctoo, then reuse these as Import formats where applicable (specifically, multiple-worksheet Excel spreadsheets, for example).

gklyne commented 6 years ago

My current plan is to populate as much as possible from GeoNames as a separate (and presumably initial) step to creating a new place record. Then to generate additional data via a spreadsheet or other means.

What isn't yet clear in my mind is how we handle the data merging. I understand Timbuctoo has (or will have) an option to add or replace data, but we might end up needing something a little more subtle.

The approach of export -> edit -> import seems plausible to me. Of course, there will be details...

In choosing the import/export format, I think we should take care to avoid mixing elements of data merging logic with the import/export capability in Timbuctoo. E.g. If tabular, how do we handle the deeper graph structures used for, e.g., place relations and map resources?

kintopp commented 6 years ago

Current plan is to work with Graham's Annalist tool as a temporary solution until Timbuctoo's editor is in place at the end of September. Timbuctoo's current export spreadsheet format (from sample shared by Marnix) looks well suited for export, less so as a template for import.