DCLP / dclpxsltbox

Sandbox for development, testing, and review of XSLT for DCLP
http://dclp.github.io/dclpxsltbox/
1 stars 5 forks source link

missing diacriticals in multiple texts #282

Closed paregorios closed 7 years ago

paregorios commented 7 years ago

Papyrologists: what is the right way to do this?

This issue is a blocker for resolution of #130.

In sorting through issue #130, I believe I have discovered that encoding practice for dealing with lenis, asper, etc. in the original witness is inconsistent across the corpus. Take, for example, lenis. In DDB we find the following examples in 10 texts:

In most of these cases, the proper unicode character(s) for lenis and the vowel are encoded, and the fact of the ancient usage of same is indicated by using the <hi> element. But note P.Prag 1.37, which omits the unicode lenis, and P.Ness 3, which is internally inconsistent.

A similar lack of consistency can be found in the DCLP content:

@rla2118 @jcowey @HolgerEssler @rogerbagnall @jds15 @jlougovaya

jds15 commented 7 years ago

I might not follow entirely, but… Do not expect consistency between unicode and Hi@. <hi rend="lenis”> represents what the scribe wrote in antiquity on the papyrus. The Unicode chars, whether they agree with what the scribe wrote or not, represent what the modern editor printed in the edition.

this help?

-- Associate Professor in Class ical Studies & History, Duke University | Duke Collaborat ory for Classics Computing | Greek, Roman and Byzantine S tudies | Duke Data Bank of D ocumentary Papyri | papyri.i nfo | people.duke.edu/~jds15http://people.duke.edu/~jds15

On Jun 13, 2017, at 11:44 AM, Tom Elliott notifications@github.com<mailto:notifications@github.com> wrote:

Papyrologists: what is the right way to do this?

In sorting through issue #130https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_DCLP_dclpxsltbox_issues_130&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=4f_CVda36zfrayONoZ1U55p-f0O-KqPU-zswi3gmxdk&s=IHDC7QKytLCvlo2j5aaaWT6vq8aypITO62rlqdEgZUk&e=, I believe I have discovered that encoding practice for dealing with lenis, asper, etc. in the original witness is inconsistent across the corpus. Take, for example, lenis. In DDB we find the following examples in 10 texts:

In most of these cases, the proper unicode character(s) for lenis and the vowel are encoded, and the fact of the ancient usage of same is indicated by using the element. But note P.Prag 1.37, which omits the unicode lenis, and P.Ness 3, which is internally inconsistent.

A similar lack of consistency can be found in the DCLP content:

@rla2118https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rla2118&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=4f_CVda36zfrayONoZ1U55p-f0O-KqPU-zswi3gmxdk&s=b-yQM2ub3hF18bEVBeCeoQcRDcCQaiu51Lxc0hIbFGQ&e= @jcoweyhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jcowey&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=4f_CVda36zfrayONoZ1U55p-f0O-KqPU-zswi3gmxdk&s=9Nlfvm7sfIVzNrnSRov_1pKcpwU9bS9w_WiSQdW5tIw&e= @HolgerEsslerhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_holgeressler&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=4f_CVda36zfrayONoZ1U55p-f0O-KqPU-zswi3gmxdk&s=XYsWi4OfOtRucqkTEuVMylJN4Fc-umn7PJreEfw-zF4&e= @rogerbagnallhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rogerbagnall&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=4f_CVda36zfrayONoZ1U55p-f0O-KqPU-zswi3gmxdk&s=znD2C8-85IWwBJsTn2o8x7PyymPYJKw7jZ_s-6cJ388&e= @jds15https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jds15&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=4f_CVda36zfrayONoZ1U55p-f0O-KqPU-zswi3gmxdk&s=YJBCENtBN0TASX6UTjF09bzFJvOmcMv7jKYqNxcgiIo&e= @jlougovayahttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jlougovaya&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=4f_CVda36zfrayONoZ1U55p-f0O-KqPU-zswi3gmxdk&s=7XVaCPv4j9TBx4eYdZ04CNqpjl82o1qreNhHap2xHKw&e=

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_DCLP_dclpxsltbox_issues_282&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=4f_CVda36zfrayONoZ1U55p-f0O-KqPU-zswi3gmxdk&s=Q1y5k-MgPK2pqh3QOe53r-2SR9nf7WcXJjpmd4HsUvs&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACBqc8YP2GHcpEhiMH7PjJbpgrDcKaHqks5sDq57gaJpZM4N4sCF&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=4f_CVda36zfrayONoZ1U55p-f0O-KqPU-zswi3gmxdk&s=Cl2FRCePA0uq2dwhJHFdsBcUpdFBKxXFr0cB7iCWYA0&e=.

rla2118 commented 7 years ago

Josh is right. Spot-checking some of the TM numbers that were said to have inconsistent elements didn't turn up any real inconsistencies. They were cases where <hi rend="lenis”> and the Unicode lenis did not coincide.

paregorios commented 7 years ago

Thanks both for explaining and checking. Your answers will help us narrow down what the real issue is for #130.

jds15 commented 7 years ago

At #130 I see: = = =

Leiden+: <:<# Α( ᾿)σμβ=1242#>|subst|<# Α( ᾿)σμγ=1243#>:>

Displayed corr. ex Α(ἀ)σμγ : ἀσμβ papyrus

Expected to see: corr. ex Ασμγ : ἀσμγ papyrus : ἀσμβ papyrus = = =

For last I think I expect to see: Text: Aσμβ app: corr. ex Ασμγ ; Α᾿σμγ papyrus

Really I don’t expect to see ancient lenis on numbers at all. But if there were an analogous example with words rather than numbers, this is how I’d expect it to render.

And in fact I’d prefer Text: Aσμβ app: corr. ex Α᾿σμγ …but we handle ancient diacriticals as a separate app entry, which is crap, but (we decided) less crap than what we would have had to do to display the ancient diacriticals up in the text (or as necessary embedded in app entries).

josh

-- Associate Professor in Class ical Studies & History, Duke University | Duke Collaborat ory for Classics Computing | Greek, Roman and Byzantine S tudies | Duke Data Bank of D ocumentary Papyri | papyri.i nfo | people.duke.edu/~jds15http://people.duke.edu/~jds15

On Jun 14, 2017, at 12:34 PM, Tom Elliott notifications@github.com<mailto:notifications@github.com> wrote:

Thanks both for explaining and checking. Your answers will help us narrow down what the real issue is for #130https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_DCLP_dclpxsltbox_issues_130&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=hmXsN0-5IvScXSpmX_f6WpKTtIzTMAzWgt41FNbqOnU&s=9_-AOZ41jPRhMgxz--oVAc0I0ZnUDmOMwyW780wzexA&e=.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_DCLP_dclpxsltbox_issues_282-23issuecomment-2D308487246&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=hmXsN0-5IvScXSpmX_f6WpKTtIzTMAzWgt41FNbqOnU&s=VnVB2-kOuvttW4E7ERssr_PEcX-hBqm37Bj5AJBVEJg&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACBqc-2D0OJspCHDGAg-5FAJ2z3PnoqUaDmqks5sEAuLgaJpZM4N4sCF&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=GWmHrFDZvSPNmQnfB_uD9Q&m=hmXsN0-5IvScXSpmX_f6WpKTtIzTMAzWgt41FNbqOnU&s=vnuhU5TVx8AbQW7LoBxBGjVKvJw2pXKydzht8wm__AA&e=.

paregorios commented 7 years ago

@rla2118 and @jcowey to follow up with response/resolution of issues raised above by @jds15

jcowey commented 7 years ago

When Greek letters are used to indicate numbers I would prefer to see the use in DCLP of capital Greek letters to indicate that we are dealing with multiples of thousands (as we do in papyri.info for the documentary texts). It would then be consistent practice across ddbdp and dclp. If that decision were taken we would have to clean up a few files in DCLP and part of this problem would disappear.

rogerbagnall commented 7 years ago

Agreed. Always use caps for thousands.

paregorios commented 7 years ago

@jcowey please clarify in two ways:

  1. You want to see the text ENTERED with capital Greek letters (for thousands) or you want to see them entered however but ALWAYS have them OUTPUT BY XSLT in capitals?
  2. I do not see that you have addressed whether you agree otherwise with the recommendation made by @jds15 at all.

Thanks

paregorios commented 7 years ago

/me to add clarification notes and assign for resolution.

paregorios commented 7 years ago

Per @jcowey there is conflation in discussion above between technical requirements and papyri.info/dclp data entry conventions. To wit:

Consequently, no "ancient lenis" should have been entered with the numerals in TM 63423 in the first place. So, this is a "no action."