Princeton-CDH / pemm-scripts

scripts & tools for the Princeton Ethiopian Miracles of Mary project
Apache License 2.0
1 stars 0 forks source link

Generate a data export to share with Hamburg BetaMasaheft project #53

Closed kmcelwee closed 4 years ago

kmcelwee commented 4 years ago

Let's try to keep this conversation here, so the information needed is in one place. Here are the preliminary questions I have:

  1. What formats are required? Providing only XML is sufficient, correct?
  2. Are there specific columns / sheets that should be excluded while exporting?
  3. Is there any special cleaning that needs to be taken before sending it over?
  4. Are there any resources you can point me to? You've done this before correct? Please provide the before / after on what you've done manually, and I'll do my best to replicate it with a program.
elambrinaki commented 4 years ago

@kmcelwee 1 I am sure that yes, but I'll ask them to confirm that. 2, 3 We have temporary fields and sheets that will be deleted after we finish cataloging. Another question is whether we should send the full dataset (once completed) and let them choose what they need, or we should do cleaning beforehead. I have an idea about the information they collect, and I can select fields they do not need for sure (e.g. "Macomber Incipit" on the Story Instance sheet). At the same time, they may want to do it themselves and/or be able to do it better themselves. Should I ask them about their preferences? 4 Their app is called BetaMasaheft, and they allow to do SPARQL search https://betamasaheft.eu/sparql. Some things are already in the human-readable form, i.e. the correspondance between their IDs and Macomber IDs https://betamasaheft.eu/bibliography?pointer=bm:MacomberMiracles (in this case, I didn't use SPARQL interface, but copied the list in excel). What I did manually was to find their IDs for manuscripts. Neither we nor Hamburg use Macomber's manuscripts abbreviations as IDs. I knew from which catalogues Macomber had taken the manuscripts, and I searched the list of catalogues on Beta Masaheft to find the ones we needed. https://betamasaheft.eu/catalogues/list Their Github: Manuscripts records https://github.com/BetaMasaheft/Manuscripts Miracles records https://github.com/BetaMasaheft/Works

rlskoeser commented 4 years ago

I think only a selection of PEMM data makes sense for them.

@elambrinaki it would be great if you want to ask them what their preference is. I would love it if there's a way for them to do any transformation work on their side. As we already have the data in structured form on GitHub, maybe they could just work from that if we work with them to determine what files and fields match their existing data?

elambrinaki commented 4 years ago

@kmcelwee @rlskoeser Pietro from Hamburg says that having csv files is enough for them. So we will keep working on the database, and once it is ready, Pietro will import the data. Should I close the issue?

rlskoeser commented 4 years ago

That's great @elambrinaki, glad to hear it. Yes, works for me to close this issue. I'm going to mark it as a "won't fix" since we're not actually implementing anything.

@elambrinaki has Pietro looked at the CSV files yet? It would be good to know sooner than later if he needs any additional information about the structure of the files.

elambrinaki commented 4 years ago

@rlskoeser I think so. He knows that we match our manuscripts IDs to their IDs, and it is the only thing he finds crucial now. Do you think there might be technical questions about the structure of the files Wendy and I won't be able to answer?

rlskoeser commented 4 years ago

@elambrinaki no, just want to confirm we have the fields that he needs in a format he can use. The CSV files seem pretty straightforward to me and I think we've documented them fairly well, but if there is any work needed from my team I'd like to know now.

elambrinaki commented 4 years ago

@rlskoeser I think we are good.