MarjorieBurghart / VulgateGlaire

Une version TEI XML de la traduction française de la Vulgate (Bible latine) par l'Abbé Glaire (†1879)
2 stars 3 forks source link

Italicised parentheses and their potential further consequences #19

Open DavidHaslam opened 6 years ago

DavidHaslam commented 6 years ago

While doing some experimenting with a view to creating a SWORD module with a more readable variant, I discovered that in my modified OSIS XML file, there are 65 hits to the regexp <hi .*?>.+?<seg .+?>(.+?)</seg>.*?</hi>. This construction is not valid XML syntax.

cf. In the unmodified OSIS XML file, there are 65 matches to the regexp <hi .+>.*\([^\(\)]+\).*</hi>

These are places where text with italics markup contains a pair of parentheses.

Italicised (parentheses) don't really look nice either, compared to non-italicised (parentheses) within italicised text. [except possibly when the first parenthesised character is the letter p]

Irrespective of the aesthetics, the simplest solution to prevent the resulting XML syntax errors would be for me to avoid having italicised parentheses in my XML source text for the experimental module with variants. i.e. By splitting the italics markup into three, excluding the parentheses, thus: <hi .+>.+</hi><seg .+>(<hi .+>.+</hi>)</seg><hi .+>.+</hi>

No immediate action required for the TEI XML files. I just thought it would be of interest to share this observation.

DavidHaslam commented 6 years ago

FIO. Refer to: https://crosswire.org/wiki/OSIS_Bibles#Marking_variants

Although the term variant is usually used in the context of the original languages source text, it can still be a useful technique to consider for marking alternative renderings in a translation.

Not all front-end apps have UI support for variants. Xiphos does, but (e.g.) PocketSword doesn't.

The default for the SWORD app Xiphos is to display variant 1. To hide variant 2 (for instance) the user merely selects variant 1 in the UI for the module options. There's no immediate need for me to mark the text for the hypothetical "variant 1" in any way. This could be done, but would require a detailed examination of the context for each location (q.v.)

UPDATE: 2017-10-12 I have now made a module with GlobalOptionFilter=OSISVariants in the .conf file. This provides a neat method to show/hide the parenthesised text marked as variant 2, using the OSIS XML syntax <seg type="x-variant" subType="x-2">(...) </seg>

Aside: It also made sense for me to move any space occurring immediately after a closing </seg> to just before the </seg>. This avoids having a spurious space still showing when the variant is not displayed.

btw. I think Glaire assumed that the reader would discern how much (if any) of the text preceding each alternative rendering must be mentally suppressed for the alternative to still be good French grammar and make adequate sense.

DavidHaslam commented 6 years ago

btw. My XML syntax errors were all within the italicised Psalm titles found in the first verse of many Psalms.

Variant 2 XML syntax errors.zip

DavidHaslam commented 6 years ago

Moreover, there are already 2 instances where the parentheses are outside the italics markup:

<verse sID="Ps.15.1" osisID="Ps.15.1" n="1"/><hi type="italic">Inscription du titre, de David </hi>(<hi type="italic">lui-même</hi>). Conservez-moi, Seigneur, car j’ai espéré en vous.
<verse sID="Ps.27.1" osisID="Ps.27.1" n="1"/>(<hi type="italic">Psaume de David lui-même.</hi> note) Je crierai vers vous, Seigneur ; mon Dieu, ne gardez pas le silence à mon égard, de peur que, si vous ne me répondez pas, je ne sois semblable à ceux qui descendent dans la fosse.

This indicates at least that the aesthetic aspect had been already recognised, albeit so barely.

DavidHaslam commented 6 years ago

My current difficulty is due to the fact that there are more ) than ( within italics markup.

This implies that at least 4 parentheses (whether left or right) are misplaced with respect to italics markup.

This results in XML syntax errors because of bad nesting when attempting to exclude all parentheses from italics.

These are the locations of the 4 misplaced ) – temporarily marked with a hash symbol.

<verse sID="Ps.99.1" osisID="Ps.99.1" n="1"/><hi type="italic">Psaume pour la ((de) louange </hi>(ou<hi type="italic">d’actions de grâces#).</hi>
<verse sID="Ps.100.1" osisID="Ps.100.1" n="1"/><hi type="italic">Psaume à David </hi>(ou<hi type="italic">de David lui-même#).</hi> Je chanterai, Seigneur, devant vous votre miséricorde et votre justice. Je les chanterai au son des instruments (sur le psaltérion),
<verse sID="Ps.137.1" osisID="Ps.137.1" n="1"/><hi type="italic">De ((A) David </hi>(ou<hi type="italic">de David lui-même#).</hi> Je vous célébrerai (glorifierai), Seigneur, de tout mon cœur, parce que vous avez écouté les paroles de ma bouche. Je vous chanterai des hymnes en présence des anges ;
<verse sID="Ps.144.1" osisID="Ps.144.1" n="1"/><hi type="italic">Louange de ((à) David </hi>(ou <hi type="italic">de David lui-même#).</hi> Je vous exalterai, ô (mon) Dieu mon roi, et je bénirai votre nom à jamais et dans les siècles des siècles.
DavidHaslam commented 6 years ago

Just edited the file Ps.xml in the Editing3 branch of my fork. This fixes only the mismatched italicised parentheses in Psalms 99, 100, 137, 144.

Pull request made. @MarjorieBurghart

Not addressed the wider issue of italicised parentheses in general.

DavidHaslam commented 6 years ago

Making the above changes fixed the reported XML syntax errors in my modified OSIS file.

After a minor adjustment to my filter to add all the seg elements with type="variant" – it also validated subsequently against the OSIS 2.1.1 schema.

My modified OSIS XML file now has 21413 seg elements. If this were to be used to make a SWORD module, these alternate renderings could be toggled to display or hide.

These leaves 31 items of parenthesised text unchanged. And there were (after all) 3 places where the parenthesised section spans across at least two verses. Those are somewhere in the books 1 Chronicles, Job and Hebrews.

DavidHaslam commented 6 years ago

Even if we chose not to release such an enhanced module, this exercise has been valuable in identifying the further minor problems caused by leaving some parentheses with unmatched italics.

And there's nothing to prevent us to fixing all the italicised parentheses that we still have in the TEI files.

MarjorieBurghart commented 6 years ago

I've merged your pull request (with just the correction of the 4 psalm titles).

But generally speaking, I'm a bit reluctant to integrate your un-italicised parentheses around italicised text: there might be a difference between the English and French typographic rules here. I think we (French) are supposed to italicise the parentheses if the first and last word within are in italics. I need to check that.

DavidHaslam commented 6 years ago

It should be understood that having a note element within a hi element is just as likely to be invalid to the TEI schema as it was for the OSIS schema.

This is more an XML issue than one of French typography.

If we do go ahead with either TEI inline notes or OSIS variants, the problem has to be faced somehow.

DavidHaslam commented 6 years ago

I should also point out that I normally validate XML files (such as OSIS) using the XML Tools plugin for Notepad++.

Currently this does not support the relax-ng-compact-syntax so if and when I attempt to insert note elements in the TEI files, I would not be in an immediate position to validate them against the schema xml-model href="KJV_1611.rnc" .

Furthermore, I'm not even certain that XML Tools supports xmlns:xi="http://www.w3.org/2001/XInclude".

DavidHaslam commented 6 years ago

Experimental SWORD module just made with GlobalOptionFilter=OSISVariants.

Screenshot of the module in Xiphos with Variant 2 displayed. screenshot 2017-10-12 21 10 05

Screenshot of the same passage with Variant 1 displayed. screenshot 2017-10-12 21 10 21

This is the "less distracting to read" view. It's also the default option after installing the module. At any time, the reader can select variant 2 and see the parenthesised text.

These options are accessed via the module context menu – {right click} – in the main window.

Aside: The OSIS markup requires some further minor adjustments to avoid things like the space before the full-stop at the end of verse 3 in this chapter. i.e.

The latter is to avoid joining two words that should remain separate when variant 1 is selected. cf. I'd already moved a significant number of trailing spaces into the seg element.

Nevertheless, the implementation has been convincingly demonstrated.

DavidHaslam commented 6 years ago

IMHO, this is superior to using footnotes, though the fact that some apps don't yet have UI support for variants counts against it a little.

Even so, there's nothing in theory to prevent a module being made in which the parenthesised text is accessible both through particular footnotes and by variant 2 in general.

That would be the best of both worlds, though it would increase module size somewhat.

Footnote tags can be toggled to show/hide in all our apps.

DavidHaslam commented 6 years ago

Apps that don't yet have UI support for selecting a variant do not default to displaying all (both) variants.

btw. Currently, the SWORD engine only supports two variants. It defaults to displaying only the Primary Reading in a module that features variants.

Aside: In a module designed to have the parenthesised text items marked up both as footnotes and as variant 2, it's worth considering placing the note elements within seg elements for variant 1, such that when variant 2 is selected, the footnote tags are hidden (when already selected to show).

DavidHaslam commented 6 years ago

It may still be feasible to italicise the parentheses after all. @MarjorieBurghart

As long as they are normal text style when the OSIS markup for variants is being added, the XML validation errors can be avoided.

Even so, it should still be feasible afterwards to move the parentheses back into the italics markup.

Having them temporarily outside the italics markup provides the extra markup that's required to avoid invalid XML.

The same concept would apply to the proposed TEI note elements (in theory).

DavidHaslam commented 6 years ago

Implementation details: Moving parentheses back into italics markup. After fixing 92 matched pairs, left behind were 1 ( and 4 ) that were normal text style. I inspected each of these 5 locations and determined that the 1 ( should be fixed and the 4 ) left alone.

NB. This sub-filter can be readily disabled if necessary.

DavidHaslam commented 6 years ago

Even Ps15.1 and Ps.27.1 now have the parentheses in the Psalm titles marked as italics in my OSIS XML file used to build the SWORD module.

<verse sID="Ps.15.1" osisID="Ps.15.1" n="1"/><hi type="italic">Inscription du titre, de David </hi><seg type="x-variant" subType="x-2"><hi type="italic">(lui-même)</hi></seg>. Conservez-moi, Seigneur, car j’ai espéré en vous.<verse eID="Ps.15.1"/>
<verse sID="Ps.27.1" osisID="Ps.27.1" n="1"/><seg type="x-variant" subType="x-2"><hi type="italic">(Psaume de David lui-même.)</hi> </seg>Je crierai vers vous, Seigneur ; mon Dieu, ne gardez pas le silence à mon égard, de peur que, si vous ne me répondez pas, je ne sois semblable à ceux qui descendent dans la fosse.<verse eID="Ps.27.1"/>