Princeton-CDH / pemm-scripts

scripts & tools for the Princeton Ethiopian Miracles of Mary project
Apache License 2.0
1 stars 0 forks source link

macomber-miracles.txt to json #62

Closed kmcelwee closed 4 years ago

kmcelwee commented 4 years ago

I thought this may be useful in the future. The data was cleaned slightly where necessary (e.g. "inc1 pit" -> "Incipit: "

rlskoeser commented 4 years ago

I'm not sure how useful this is — once the keywords are migrated into the google sheets document, there is no data left in the macomber text file that we care about, and I believe a substantial amount of cleanup has been done since that migration.

rlskoeser commented 4 years ago

No harm in keeping this, and will be nice to have a more structured version if we need to refer to the data again. @kmcelwee I noticed some of the keys include the colon (e.g. PEth: or Text:). Do you want to clean those up before we merge?

kmcelwee commented 4 years ago

Oh I didn't know what "PEth" was, I'm happy to combine the two to whichever is preferred

rlskoeser commented 4 years ago

@kmcelwee sorry, I was unclear: those should be separate fields, but in some cases the : is included in the label and in other cases (I think most cases) it is not. The field name isn't useful if isn't consistent, right?

Actually, I see now that this is occurring everywhere you have a field with no content. I'd also be fine with you just removing the empty entries.

FWIW, PEth indicates is Princeton Ethiopic manuscripts.

kmcelwee commented 4 years ago

Oof it looks like I checked keys too late in my cleaning process. I introduced that bug when fixing something else.

And it's not the most readable code, but the notebook I put together is at https://github.com/kmcelwee/pemm-jupyter/blob/master/Parse%20Macomber.ipynb