byztxt / byzantine-majority-text

Byzantine Majority Greek New Testament text edited by Robinson and Pierpont, with morphological parsing tags and Strong's numbers
The Unlicense
55 stars 13 forks source link

Fix bug in Unicode converter #4

Closed normansimonr closed 3 years ago

normansimonr commented 3 years ago

A Beta to Unicode converter was introduced by #3. It was detected that some of the adscripted/subscripted iotas weren't being converted and were left as Beta code in the output files. This was caused by the fact that Dr. Robinson's code follows a different character precedence order standard than conventional beta code. This was also causing some breathings and accents to appear detached from their letters, particularly when the letter was uppercase. This PR fixes this bug and updates the Unicode CSV files to make them correct.

Example of wrong conversion of iotas: Matthew 11:23 (ʽ|Αδου vs ᾍδου)

Example of wrong conversion of breathings: 1 Corinthians 1:1 (ʼΙησοῦ vs Ἰησοῦ )

The changes to the converter.py script are to make sure that the precedence order is as follows (we only touched uppercase letters in this PR since conversion of lowercase letters seems to be working fine):

  1. asterisk
  2. breathing
  3. accent
  4. letter
  5. iota subscript

Example: *(=W|

This precedence order is required by the beta-code library (see here and here - the second website is linked in the first one), which is the library that we used in #3 to do the conversion. @emg I hope that no more major issues are found in the Unicode files after this fix!