Review metadata for new Canons 6 corpora

amir-zeldes commented 6 years ago

Need urns and metadata review in general for new corpora:

[ ] sheonute.throne
[ ] shenoute.obliged
[ ] ~~sahidic.ot (CoptOT base text version)~~ see #21

Note that CoptOT translation has been switched to Brenton.

ctschroeder commented 6 years ago

It is so great to see all this new Coptic! I have looked only at shenoute.throne so far. For @somiyagawa and @amir-zeldes: I have two big questions and then need information for some metadata fields. Probably you will want to answer the questions first, since metadata depends on the answers:

Why are the documents divided this way? pp. 1-9, 10-19, 20-28? This means the recto of one ms folio is in a different document file than the verso. Coptic SCRIPTORIUM's practice is to digitize according to the manuscript's fragmentation. (Otherwise the metadata is a mess.) We keep all the consecutive folios of a manuscript together when they are consecutive in their original page order with no lacunae AND at the same repository now AND with consecutive catalog numbers. (So an example: if pp. 4-8 of an ancient codex are in the British Library and pp. 9-12 are at the Paris BN, pp. 4-8 is a separate document. If pp. 9-12 are at the BN but pp. 9-10 are in book 130/2 and 11-12 are in 131/4 then pp 9-12 are two separate documents.) I will need the documents reorganized so that one document = one fragment. Edited to add: if there are any manuscript parallels in "I Am Not Obliged" or "He Who Sits on His Throne" then they must be accounted for in the document structure, as well.
Does Heike Behlmer have a versification system in mind for this text? I know she has not published her edition, yet, but she may have chapters or verses in her notes. Wherever possible we want to use the versification scheme of the editor. For example: David Brakke has not published the Coptic of his Discourses, yet, but he had chapter breaks in his transcription. We used those for our chapters in the texts of his we've published.

Then once those questions are answered, I need info for each of the following metadata fields. Please add them to each file in GitDox. For most of the items, I do not know the information, so I cannot do it myself. An explanation of all the fields is on our wiki here (scroll down):

[ ] Was Heike Behlmer involved in the transcription, annotation, etc? We have metadata fields "source" and "source_info"; "Source" usually lists editors, etc., who transcribed/digitized and provided the text. "source_info" links to the source if it was already online. Please add all the editors to the text BEFORE it got into the CS text into the "source" metadatum; if there is a website please add a link to "source_info"
[ ] Coptic_edition
[ ] pages_from and pages_to (this will depend on the document structure -- see above)
[ ] note please add here any information about the process that we need, especially about how you came up with the translation
[ ] repository
[ ] collection
[ ] idno
[ ] Trismegistos if there is a TN; otherwise skip or "none"
[ ] placeName usually Atripe for WM manuscripts
[ ] origPlace probably White Monastery
[ ] origDate
[ ] origDate_precision
[ ] origDate_notBefore
[ ] origDate_notAfter
[ ] witness
[ ] redundant (I'm guessing "no" for all of these since they look to be from the same codex)

Many thanks!

ctschroeder commented 6 years ago

Again really looking forward to seeing this all published, and very grateful to @somiyagawa, Julien, and the other folks in Goettingen!!!

amir-zeldes commented 6 years ago

The document splits are my fault - as you know, they need to be smaller than the entire contiguous section, since that will cause bad latency in ANNIS otherwise. I just split them into NBFB-sized chunks (that's always seemed to work, I think we did the same for Eagerness), and it happened to cut across R/V.

So just to verify before I change them: would it be OK to do:

1-10
11-20
21-28
47

?

@somiyagawa : this means I'll have to ask you to look at the translation boundaries for the first two documents again, sorry for the mix up!

ctschroeder commented 6 years ago

Thanks for looking at this Amir! There are a few things going on, so please don’t break the docs up quite yet.

As I mentioned upthread, we also need to break them based on contiguity in the modern repository. Someone needs to check and see where these mss live, document their call numbers, etc., and then break the documents based on how the mss are fragmented.

If this is all one continuous fragment sitting on a shelf in one place with one call #, then we could do what you propose. But first So or Julien need to look up this information. Then we can break the text into the proper document sizes and add the correct metadata.

Thanks! We are very close. Just a few last details.

Sent from my iPad

On Jul 3, 2018, at 7:17 AM, Amir Zeldes notifications@github.com wrote:

The document splits are my fault - as you know, they need to be smaller than the entire contiguous section, since that will cause bad latency in ANNIS otherwise. I just split them into NBFB-sized chunks (that's always seemed to work, I think we did the same for Eagerness), and it happened to cut across R/V.

So just to verify before I change them: would it be OK to do:

1-10 11-20 21-28 47 ?

@somiyagawa : this means I'll have to ask you to look at the translation boundaries for the first two documents again, sorry for the mix up!

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

amir-zeldes commented 6 years ago

OK, sure, just let me know what splits you want and I can implement those, with the caveat that I may need to makes some added splits due to processing concerns (but I'll never need to merge two things that don't belong together).

ctschroeder commented 6 years ago

Ok. We need to hear from So.

Sent from my iPad.

ctschroeder commented 6 years ago

Hey folks, I'm separating out the OT into a separate thread. #21

ctschroeder commented 6 years ago

I have looked at shenoute.obliged. The same metadata and document division questions from shenoute.throne apply to shenoute.obliged.

ctschroeder commented 6 years ago

Also if there are any manuscript parallels in shenoute.obliged, those need to be accounted for in the document structure.

CopticScriptorium / corpora

Review metadata for new Canons 6 corpora #20