Closed kmcelwee closed 3 years ago
@richmanrachel I'm revising the document import to map languages from the spreadsheet to language+script based on display names, and have made the lookup case-insensitive as @kmcelwee recommended. That's resolved a larger number of the language mapping problems. Here are the few that are left, where we need help from you:
@rlskoeser - a lot of this stuff turns out to be somewhat complicated:
Syriac: shows up frequently, but there is no matching display name. Which Syriac Language+Script combination should we map these to?
- I think this may have to be done by a human with Syriac language training as I don't know which of the script styles are more common for this time/place. If it's easy for you to create a proper list of the ones that need to be checked, I'll try to find someone who can do this task.
Missing or not mapping properly: PGPID: 32605, Language: Amharic
- I added Amharic to the language spreadsheet, but am waiting to hear back from an expert to know what to call the script.
PGPID: 30977, Language: Turkish
- Added to spreadsheet (as it's in Latin script unlike other current Turkish entries), but we need to make a decision about whether to call this "Turkish" or "Modern Turkish"
PGPID: 29377, Language: Christian Palestinian Aramaic
- Sent this to a friend who might help me figure out if this is a new language or counts as one of the versions of Syriac.
PGPID: 11135, Language: Sanskrit (I remember seeing some discussion of this! But don't know if it was resolved)
- The resolution was that Sanskrit should not be a proper language for this documents (only referenced in the description, as it's on a separate piece of paper, not the original manuscript). I'll ask Abigail to change the metadata sheet.
coptic numerals — I remember Marina said it was complicated and we should discuss, can this go on the meeting agenda?
- Yes, I added it to the agenda.
two cases of where language is one or another; could these be edited so they will be pulled in as probable languages? PGPID: 31083, Language: Hebrew or Judaeo-Arabic
- It's almost impossible for me to read. Asking Marina now, but probable languages and listing them both will likely make sense.
PGPID: 31232, Language: Greek or Coptic
- Just sent out an inquiry.
@richmanrachel thanks for looking into all of these! Here are the documents where the Language field includes Syriac:
PGPID | Shelfmark - Current | Type | Tags | Description | Language (optional) |
---|---|---|---|---|---|
31461 | CUL Or.1080 14.71 | Literary | #Arabic literary #Syriac | 19 small scraps, more or less neatly cut from a literary work. Many of them have Arabic on one side and Syriac (Garshuni script) on the other, with approximately the same line spacing and occasional use of red ink, suggesting that they belong to the same work or at least were written by the same scribe. The Arabic text on f.14v, one of the larger pieces, includes the sentence, "O Sergius, o most beloved of friends. I am abashed of. . . ." Needs further examination. | Arabic; Syriac |
31467 | CUL Or.1081 2.75.1 | Unknown | #Latin script | Mysterious page with various jottings in Hebrew script in a late hand, along with a few Latin characters (m, ma, mua). | Syriac |
31391 | CUL Or.1081 2.75.10 | List or table | nan | Small fragment of an account in Western Arabic numerals and what may be Ladino. | Syriac |
31472 | CUL Or.1081 2.75.12 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31473 | CUL Or.1081 2.75.13 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31475 | CUL Or.1081 2.75.16 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31477 | CUL Or.1081 2.75.19 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31468 | CUL Or.1081 2.75.2 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31478 | CUL Or.1081 2.75.20 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31479 | CUL Or.1081 2.75.21 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31480 | CUL Or.1081 2.75.23 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31481 | CUL Or.1081 2.75.24 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31482 | CUL Or.1081 2.75.26 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31483 | CUL Or.1081 2.75.27 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31484 | CUL Or.1081 2.75.28 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31469 | CUL Or.1081 2.75.3 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31411 | CUL Or.1081 2.75.30 | Literary | #Syriac #Garshuni | Fragment containing "the text of the Makherzonutho or Proclamation that a deacon chants prior to the reading of the Gospel. . . [from] the Book of Anaphora, the priest's manual rather than the Tekso deacon manual." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31486 | CUL Or.1081 2.75.31 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31487 | CUL Or.1081 2.75.35 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31470 | CUL Or.1081 2.75.6 | Literary | #Syriac | "The Or.1081 2.75 material contains schooling exercises: practices of the alphabet and ligatures, repeated phrases from liturgical hymns, and snippets of Psalm readings. The carelessness in writing is simply due to the fact that we are looking at a pupil’s hand. While our pupil(s) had yet to master the esthetics of calligraphy, they seem to have been thrown into writing longer texts as part of their schooling." See George Kiraz, "A Young Syriac Pupil in the Cairo Genizah: Or.1081 2.75.30," Fragment of the Month, August 2018. | Syriac |
31471 | CUL Or.1081 2.75.9 | List or table | nan | Fragment of an account in western Arabic numerals; no words. | Syriac |
32091 | T-S AS 204.351–56 | Literary | #Syriac | Liturgical text, Nestorian. In Syriac. See Sebastian P. Brock, “East Syrian Liturgical Fragments from the Cairo Genizah,” Oriens Christianus 68 (1984) pp. 58-79. Idem, “Some Further East Syrian Liturgical Fragments from the Cairo Genizah,” Oriens Christianus 74 (1990) pp. 44-61. Information from FGP. | Syriac |
29376 | T-S 16.319 | Literary | #CUDL #palimpsest #Syriac | Palimpsest consisting of the Palestinian Talmud, Peʾa 18d and 20b-c, written over a Syriac text, The Life of St Anthony by Athanasius of Alexandria. Edited in Lewis (1902: 146-149) as text XXXV. (Information from CUDL) | Hebrew; Aramaic; Syriac |
@rlskoeser - I think everything but the Syriac texts without scripts are now resolved based on updates to the language spreadsheet.
@richmanrachel finally circling back to this and wanted to confirm what I think we decided (based on looking back at meeting notes, searching Slack, and my memory) before I implement any changes.
@rlskoeser - thank you for the followup!
Coptic numerals in the spreadsheet should be mapped to Greek/Coptic Numerals
- Correct.
You proposed adding Unknown language + Hebrew script, which would be for the "Hebrew or Judaeo-Arabic" document, but I don't see it in the Language+Script spreadsheet; is this still the plan?
- Sorry about that - yes, it's done now!
I think we proposed adding Syriac (Unknown script) for mapping the Syriac documents in the spreadsheet, but that hasn't been added to the Language+Script spreadsheet; is this still the plan ? (I couldn't find where we discussed/decided this)
- Alan talked to a Syriac professor and got some pointers on how to identify the script. Can we give him a couple weeks to just go through the current documents? Or should I add a Syriac (unknown script) for now?
Should the Turkish document be mapped to Modern Turkish?
- Yes.
Update, Alan would prefer that we have a Syriac (Unknown Script) category anyway for future researchers, so I'll add that now.
Reran data import in QA with revised language mapping based on display name and (for a few cases) the new spreadsheet_name
column I added to help with the import.
The only document that's still reporting a problem related to language is the "Greek or Coptic" one, which I know we're working on resolving. (I don't expect it to change language import logic.)
script output:
Imported 35 collections
Imported 49 languages
skipping PGPID 27264 (demerge)
... [lots more skipped] ...
ERROR language not found. PGPID: 31232, Language: Greek or Coptic
... [more skipped] ...
skipping PGPID 28089 (demerge)
Imported 29834 documents, 926 with joins; skipped 179
@rlskoeser - MR has a question: why are languages the only things that are clickable (rather than display name or script as well)?
MR has a question: why are languages the only things that are clickable (rather than display name or script as well)?
By default django admin uses the first value as the one that is clickable to go into the edit form. We can change the order or set others to be clickable as well. I wondered about using display name, but that's an optional field. We could use the auto-generated display name for records without a custom one set, but I wasn't sure if that would be confusing!
@rlskoeser - That's really helpful. I think the only language we have without a display name now is Unknown in Hebrew characters which we can make a display name.
Decision: Let's make display name the default and the only clickable one.
@rlskoeser - I assume we'll need to retest once this is the case, so we shouldn't bother doing the rest of this testing now?
Do you still want language & script displayed on the list view? Or only display name?
Actually, language+script admin display is not part of this story! please test whether the import is working properly and assigning the correct languages
You can open a new issue to request adjusting the language+script list display
"Hebrew or Judaeo-Arabic" was officially changed to "Unknown: Hebrew script" but it all works!
testing notes
Check that imported documents have the correct languages associated based on what is in the spreadsheet. Recommend checking a variety of things, including both normal cases including:
Also be sure to test the odder cases and outliers, including:
dev notes