MarjorieBurghart / VulgateGlaire

Une version TEI XML de la traduction française de la Vulgate (Bible latine) par l'Abbé Glaire (†1879)
2 stars 3 forks source link

Mismatched number of parentheses #6

Closed DavidHaslam closed 6 years ago

DavidHaslam commented 7 years ago

The character frequency count for my concatenated USFM file includes:

U+0028  (   21,440  LEFT PARENTHESIS
U+0029  )   21,446  RIGHT PARENTHESIS

This indicates that there are at least 6 unmatched right parentheses. i.e. At least 6 places where the left parenthesis is missing.

Locating this is not impossible. Each matching pair of parentheses can be temporarily replaced by another suitable pair of characters. Then a search for what's left of either ( or ) will find the unpaired ones.

DavidHaslam commented 7 years ago

My hunch in stating "at least" was correct.

Search results for ( are

\v 30 Mais ceux qui étaient de la famille sacerdotale (princes des prêtres composaient les parfums de plusieurs aromates.
\v 18 (car la compassion a grandi avec moi dès mon enfance, et est sortie avec moi du sein de ma mère ; ⸨.⸩
\v 30 Hanania, fils de Sélémias, et Hanun, sixième fils de Séleph, bâtirent auprès de (après lui un double ⸨second⸩ espace ; et auprès de lui Mosollam, fils de Barachias, bâtit le mur vis-à-vis de ses chambres. Melchias, fils d’un ⸨de l’⸩ orfèvre, bâtit auprès de ⸨après⸩ lui jusqu’à la maison des Nathinéens et des merciers ⸨ceux qui vendaient des hardes⸩, vers la porte des juges et jusqu’à la chambre de l’angle.

Search results for ) are:

\v 28 Si pourtant tu prends une femme, tu ne pèches pas ⸨; et si une vierge se marie, elle ne pèche pas.⸩. Mais ces personnes éprouveront les tribulations de la chair ; et je voudrais vous les épargner ⸨pardonne, note⸩).
\v 20 David revint aussi chez lui pour bénir sa maison. Et Michol, fille de Saül, étant venue au-devant de David, lui dit : Que le roi d’Israël a eu de gloire aujourd’hui, en se découvrant devant les servantes de ses sujets, et paraissant nu comme ferait un de ses, note) bouffon⸨s⸩ !
\v 14 Michée lui répondit : Vive le Seigneur ‘vit !), je ne dirai que ce que le Seigneur m’aura dit ⸨tout ce que m’aura dit le Seigneur, c’est ce que je dirai⸩.
\v 2 Pourquoi employez-vous votre argent à ce qui ne peut nourrir n’est pas du pain), et votre travail à ce qui ne peut rassasier ? Ecoutez-moi bien, et mangez ce qui est bon, et votre âme se délectera de mets savoureux.
\v 3 Car les ⸨des⸩ jours viennent, dit le Seigneur, où je ferai revenir certainement) les captifs de mon peuple d’Israël et de Juda, dit le Seigneur, et je les ramènerai dans le pays que j’ai donné à leurs pères, et ils le posséderont.
\v 5 Auprès d’eux bâtirent les gens de Thécua. Mais les principaux d’entre eux ne voulurent pas s’abaisser au service dans l’ouvrage) de leur Seigneur.
\v 9 Pendant le jour le Seigneur a envoyé ⸨commandé à⸩ sa miséricorde de m’environner), et la nuit son ⸨un⸩ cantique ⸨à sa louange a été dans ma bouche⸩. Au dedans de moi est une prière pour le Dieu de ma vie.
\v 28 Ils se consacrèrent à Béelphegor, et mangèrent des sacrifices offerts à des dieux sans vie des sacrifices des morts).
\v 12 que les jeunes gens hommes) et les jeunes filles ⸨vierges⸩, les vieillards et les enfants ⸨ceux qui sont plus jeunes⸩ louent le nom du Seigneur,

As you see, the provisional temporary replacements I chose are double parentheses.

NB. Some Unicode fonts do not include these two codepoints.

Replacements were done in two stages:

  1. Search pattern 1 does not span lines.
  2. Search pattern 2 does span lines to a maximum size of 4096.

Log file results:

2017-10-07 15:04:11,Info,Replace matched left and right parentheses by double parentheses
2017-10-07 15:04:11,Info,21413 replace(s) performed for pattern match [\(([^\(]+)\)]
2017-10-07 15:04:11,Info,24 replace(s) performed for pattern match [\(([^\(]+)\)]

cf. Doubling the max size to 8192 did not change the results.

DavidHaslam commented 7 years ago

Locations of the 3 unpaired ( are:

Locations of the 6 unpaired ) are:

NB. I am using the more familar booknames in this comment, rather than Vulgate book names. These locations may not be in the same order as my earlier reported search results.

FYI. The attached Excel workbook tables the Vulgate book names corresponding to those defined for USFM. Vulgate booknames.xlsx

DavidHaslam commented 7 years ago

That there are these mismatched parentheses is hindering progress towards adding suitable USFM markup for all the textual notes. A few of the lines containing note) have no opening (.

The attached text file contains a list of the 510 extracted note items. merged.notes.txt NB. In the cases with a missing ( the item runs from the start of the same line.

A few of the note items are rather odd to say the least! There are 5 which are simply (note) with no explanatory text.

MarjorieBurghart commented 6 years ago

Thanks for this too! I'm going to correct the missing parentheses. Just tried with the first, and it occurs that the first missing one is in 1 Chronicles 9:30 (not 1:30)

About the odd-sounding notes: generally, the text between parenthesis seems to be the more literal translation of the text just before, which would explain that. But I have no idea why some parentheses include the word "note" and not others.

DavidHaslam commented 6 years ago

Reference corrected in earlier comment. Thanks.

DavidHaslam commented 6 years ago

Still pondering how best to mark the parenthesised text in both USFM - and later in OSIS XML. Not obliged to, of course, but in Bible software, having a user option to toggle features off and on can be very useful.

The USFM footnote markup \f_...\f* includes this special markup:

\fqa_

cf. The SWORD API supports toggling footnotes off and on.

MarjorieBurghart commented 6 years ago

I've now corrected parentheses in the 12 verses you had spotted. About toggling the alternative, more literal translation on and off: I agree that it would be an interesting feature, but you should be careful: some of the parenthesised text is not a translation note, but the actual text of the Bible, between parentheses. I'm assuming that Glaire's source text was the Clementine version of the Vulgate, which is available online here: http://vulsearch.sourceforge.net/ If you look up the expression (\(|\)) (looking up all verses containing either a left parenthesis or a right parenthesis) you obtain 228 hits. You could assume that only parentheses in those verses where the Latin Vulgate also contains parentheses should be left as they are. But it's difficult to know whether Glaire might have decided to add parentheses of his own in the translation.

DavidHaslam commented 6 years ago

Five different editions of the Latin Vulgate are available as SWORD modules through CrossWire.

Suitable apps for smart phones and tablets include

Most other platforms have apps too.

DavidHaslam commented 6 years ago

Updated counts after your commits:

U+0028  (   21,446  LEFT PARENTHESIS
U+0029  )   21,447  RIGHT PARENTHESIS

Still one to fix in 2 Samuel 6:20 which reads (after using the same method): \v 20 David revint aussi chez lui pour bénir sa maison. Et Michol, fille de Saül, étant venue au-devant de David, lui dit : Que le roi d’Israël a eu de gloire aujourd’hui, en se découvrant devant les servantes de ses sujets, et paraissant nu comme ferait un de ses, note) bouffon⸨s⸩ !

MarjorieBurghart commented 6 years ago

Thanks! Committed the corrected file.

DavidHaslam commented 6 years ago

cf. In the text of the SWORD module VulgClementine, search results for regexp \x28.+?\x29 gave

i.e. 6 of the parenthesised items must span lines.

That's still 2 short of 228 which you observed yesterday.

Module details:

As the project states,

Work to maintain the text and correct errors that are found is ongoing: the latest update was on Oct 04 2017.

then I think our module needs updating. We probably had nobody watching out for changes since 2013.

MarjorieBurghart commented 6 years ago

Hmm, it's a bit odd... On the Vulsearch website, when I search for verses containing an opening parenthesis, I get 222 hits. When I search for verses containing a closing parenthesis, I get the exact same number of hits (but possibly different ones, if the parenthesized text spans over different verses). Things get weird when I search for verses containing either an opening OR a closing parenthesis: then I get 228 hits, which puzzles me.

DavidHaslam commented 6 years ago

Here's a regex to consider: \([^\(]+\)

This ensures that between ( and ) there are no (.

DavidHaslam commented 6 years ago

btw. I downloaded the latest edition of the Clementine Vulgate from the BitBucket repo.

I'm in the process of converting the source text to USFM format.

This involved taking note of the markup description in the top level README.md and carefully translating each feature to the most suitable USFM tags.

As and when I convert these USFM files to OSIS XML, all these features will be preserved. The derived SWORD module update will then contain all the features like poetry lines, prologs (introduction), acrostic headings, speakers, etc.

This should be a real improvement.

DavidHaslam commented 6 years ago

After the merge, we have:

U+0028  (   21,447  LEFT PARENTHESIS
U+0029  )   21,447  RIGHT PARENTHESIS

The total is made up of two types as follows:

Here's an example of nested parentheses in Genesis 6:12:

\v 12 Dieu voyant donc cette corruption de la terre (car la vie que tous les hommes y menaient était toute (car toute chair avait corrompu sa voie sur la terre) corrompue),

Evidently, then, parentheses serve more than one function in the Glaire Vulgate.

FIO. Complete analysis of my concatenated USFM files: merged.usfm.character.frequency.txt

MarjorieBurghart commented 6 years ago

So can we consider the issue is closed now?

DavidHaslam commented 6 years ago

Yes - I think we can leave alone the minor matter of spaces.

DavidHaslam commented 6 years ago

See also my further analysis and the fixes in #19