Princeton-CDH / pemm-scripts

scripts & tools for the Princeton Ethiopian Miracles of Mary project
Apache License 2.0
1 stars 0 forks source link

Macomber incipits missing from incipit tool #41

Closed elambrinaki closed 4 years ago

elambrinaki commented 4 years ago

Macomber's incipits for ID 124 and ID 159 don't show up in the Incipit tool search results:

ID 124, EMML (HMML) 3872 (the source of Macomber incipit), ወሀሎ፡ አሐዱ፡ ሊቀ፡ ጳጳሳት፡ ዘስሙ፡ ባስልዮስ፡ ወውእቱ፡ ይጌሥጾሙ፡ ለአብዕልት፡ ወይቤሎሙ፡ ሀቡ፡ እምንዋይክሙ፡ ለነዳያን ID 159, EMML (HMML) 7089, ወሀሎ፡ ደሴት፡ መንገለ፡ ባሕረ፡ ኢያሪኮ፡ ወሀለዋ፡ ህየ፡ ብዙኃት፡ መነኮሳይያት። ወእምኔሆን፡ አሐቲ፡ መበለት፡ ፈራሂተ፡ እግዚአብሔር፡ ወትትቀነይ፡ ለቤተ፡ ክርስቲያነ፡ እግዝእትነ፡…. ወትትለአክ፡ በእንተ፡ ምጽዋት፡ ወቍርባን፡ ወወይን፡ ወዕጣን፡ ወጽንሓሕ

I checked this csv file, which I believe is the source of incipits for the Incipit Search Tool, and the inciptis are there.

I found the problem with the above two IDs occasionally, I didn't check all the IDs. I am wondering whether something specific happened to the incipits of these two IDs, or there is a more general problem so some other incipits might also be missing, or I do not understand something.

elambrinaki commented 4 years ago

Trying ID 124: 124

Trying ID 159: 159

Both IDs are on the Story Instance sheet, with the Macomber Incipit column checked, High Confidence score:

124_sheet

159_sheet

rlskoeser commented 4 years ago

@WendyLBelcher @elambrinaki Sorry it's taken so long to get to this, I keep starting to look at it and then get interrupted by other work.

@elambrinaki the csv file you linked to was only used as an import source into the google sheets; the story instance sheet is the source for the incipit tool.

It finally occurred to me what could be causing this, and I just confirmed what the problem is. Hoping you two can help come up with a solution.

For indexing, I have to have a unique id for each item that gets indexed. The current implementation uses the manuscript identifier and folio start for that, but — obviously that isn't unique enough! Macomber 123 is on EMML (HMML) 3872 21v, and 160 is on EMML (HMML) 7089 32r — which is why the others aren't showing up.

Any suggestions to make this guaranteed unique? If I add macomber number to manuscript and folio start will that be unique enough?

elambrinaki commented 4 years ago

@rlskoeser @WendyLBelcher Hi Rebecca, I decided to make a note here in addition to our talk in Slack. Yes, adding a story ID (Macomber ID) to the manuscript number and folio start number will be enough. You found a perfect solution, thank you!

rlskoeser commented 4 years ago

The test incipit search tool and test indexing have been linked to the new test spreadsheet and indexing has been updated.

Please confirm that the missing incipits are findable now that we've changed the unique to ensure it's unique.

elambrinaki commented 4 years ago

I can confirm that.