Closed DavidHaslam closed 6 years ago
The screenshot from Xiphos 4.0.6a (Windows) illustrates a footnote in 2 Kings 12:4.
Apart from the undesirable left alignment in the preview pane, the footnote is correctly displayed for being read from Right to Left.
Here's the note element in the OSIS source:
<note placement="foot"><reference type="annotateRef" osisRef="2Kgs.12.4">12:4</reference> <catchWord>مردم شماری کے ٹیکس: </catchWord>دیکھئے <seg type="x-nested"><reference osisRef="Exod.11.16-Exod.11.30">خروج 11:16-30</reference></seg> </note>
Here's the same Unicode text converted to NCR.
<note placement="foot"><reference type="annotateRef" osisRef="2Kgs.12.4">‏12‏:4</reference> <catchWord>مردم شماری کے ٹیکس: </catchWord>دیکھئے <seg type="x-nested"><reference osisRef="Exod.11.16-Exod.11.30">خروج 11‏:16‏-30</reference></seg> </note>
Observe the four instances of ‏
which is the RLM that I described above.
With regards to converting usfm to osis, I'm not sure that the presence of these marks matters.
The only place where they would be problematic is with creating osisRef attributes in references. (And I can strip them out easily enough in orefs.) Additionally, are all references in right to left languages going to be written like this:
laterverse-verse:chapter bookname
if so, I will need to make adjustments in orefs to accomodate this format.
The key to understanding this is the exact placements of the RLM.
But yes, it's for your orefs.py
where the possibility may be encountered.
I've been associated with CrossWire for over eight years. In all that time, AFAIK, nobody had made such observations in writing.
We don't have a huge number of RtoL Bibles, and even some of these have no notes with caller references or cross-reference notes.
I guess it's something that even the ParaTExt team may not have ever considered in detail.
Arabic has no COLON even though it has a COMMA and a FULL-STOP, as well as a SEMICOLON.
btw. One of the 354 notes in UrduGeo has the Arabic Comma as the verse,verse separator.
Even for RtoL scripts, the references are written
but the insertion of the RLMs makes the numerical parts look "back to front" to you and me.
BabelPad, the Unicode text editor for Windows developed by Andrew West, has features that help users see the order of the codepoints.
And of course, the Arabic Comma has no need for a RLM before it, as it's already a RtoL character.
One possible way to deal with (e.g.) this Urdu Bible, would be to define two of the punctuation variables to include the RLM.
SEPM = "\u061B" # separates multiple references (Arabic semicolon)
SEPC = "\u200F:" # separates chapter from verse (RLM + colon)
SEPP = "\u060C" # separates multiple verses or verse ranges (Arabic comma)
SEPR = "\u200F-" # separates verse ranges (RLM + hyphen/minus)
This still leaves the observation that any annotateRef references would have an RLM before the chapter number, assuming the translator[s] managed to get things done right.
NB. As it happens, this particular project (as yet) had no need for SEPM.
Thus to some extent, some of the adaptation can be readily done by the user.
This would leave the task of removing the RLM when a original reference is converted to the osisRef value for such annotateRef reference types.
Maybe worth to consider defining a further variable that carries this property?
SEPA = "\u200F" # defines the start of an annotate reference (RLM)
cf. For LtoR scripts this would just be the null string.
No additional variables are needed. If the references are always written left to right regardless of language direction, then the only thing I would need to do is filter out the unicode directional formatting characters when generating the osisRef attribute for the references. An easy thing to do.
Great. Easy to do quite soon?
Done.
Working with the translator of an Urdu Bible, some important points have emerged about Scripture references.
In order to correctly display the human readable references, judicious use of the RLM is required
U+200F
.See Marking references in Right to Left scripts.