Closed kmcelwee closed 4 years ago
@kmcelwee updated the description with some notes. I'll be interested to see how similar (or not) the frictionless data schema is to the internal json schema file we used to generate the spreadsheet.
Sorry: We should have had the converstation here instead of https://github.com/Princeton-CDH/pemm-data/pull/2 To summarize that conversation:
src/schema.json
exactlyI've committed and pushed new changes, but I've left the PR as a draft. I think it's safe to say the schema is not in a publishable state. I did run a small script to note the differences between pemm-scripts/src/schema.json
and what we have right now.
These sheets that have been added:
macomber_incipits.csv
& sheet2.csv
And here's a summary of the columns that have been added (the numbers here are blank columns):
canonical_story.csv
Macomber Keywords
CSM Number
Clavis ID
Translation of Story into English
Translations; formerly English Translation
field14
2
3
4
5
6
7
8
9
10
11
Macomber ID Number
Macomber ID Letter
manuscript.csv
vHMML permalink (pending on 04/25/2020)
Columns per page
Lines per column
Characters per line
Hamburg MS ID
Latitude
Longitude
Place Recorded/Purchased
Title from catalog
Total Stories according to catalog
Number of Paintings according to catalog
Link to catalog
Catalog has miracles records
Can be used for sequence (miracles folio range matches with catalog)
Mss rebound in disorder or there are breaks in the sequence of TM
story_instance.csv
Best Incipit Tool Match
Story Incomplete
Blank TM folios
Ethiopic Story Number
Story Variation
High Confidence Not IT
Princeton Catalog Folios
Princeton Catalog Titles
Body of story start folio & line
Macomber Incipit
(test on whether there are two incipits in the ITool on the same folio)
Test for whether the incipit is not unique
New mss (column for sorting)
Miracles sequence number
Folio Start Number
Folio Start Letter
Temporary English Translation for TGS 1994, to be moved when ID'd
story_origin.csv
field4
Town/Country
I think this was closed by https://github.com/Princeton-CDH/pemm-data/pull/4
Agreed.
https://frictionlessdata.io/
Start with the frictionless data python datapackage library to infer schemas for the existing CSV files in the data repository that we care about, and then see how much clean up is needed.