Open DavidHaslam opened 7 years ago
Of course, some of the unexpected "words" will turn out to be artefacts of the use of parentheses for alternate renderings. I expect this will turnout to be the case for the last item in the list:
00001 z-vous
On the other hand, some items will turn out to be real typos.
FIO. The next file is just a character frequency count of the counted words list.
FYI. Here's the 52 search results for the regexp [A-Z]\w+(-[A-Z]\w+)+
from the counted words list.
00001 Asason-Thamar
00001 Assieds-Toi
00001 Astaroth-Carnaïm
00002 Ataroth-Addar
00001 Azanoth-Thabor
00001 Baalath-Béer
00002 Baal-Hermon
00002 Baal-Pharasim
00001 Baal-Salisa
00002 Beth-Araba
00003 Beth-Hagla
00002 Beth-Maacha
00003 Cariath-Arbé
00001 Cariath-Baal
00004 Cariath-Sépher
00001 Carioth-Hesron
00001 d’Asor-Haddan
00002 Esprit-Saint
00001 Es-Tu
00002 Etes-Vous
00001 Evil-Mérodach
00001 Grande-Ourse
00001 Hammoth-Dor
00002 Havoth-Jaïr
00001 Homme-Beau
00219 Jésus-Christ
00007 Jabès-Galaad
00015 Jean-Baptiste
00079 l’Esprit-Saint
00002 L’Esprit-Saint
00001 Lésem-Dan
00007 Marie-Madeleine
00001 Néphat-Dor
00001 Nathan-Mélech
00030 Notre-Seigneur
00006 Phahath-Moab
00001 Rabbath-Ammon
00001 Ramathaïm-Sophim
00016 Ramoth-Galaad
00018 Saint-Esprit
00001 Savé-Cariathaïm
00018 Simon-Pierre
00001 Sochoth-Bénoth
00002 Suis-Moi
00001 Suivez-Moi
00003 Théglath-Phalasar
00001 Thamnath-Saraa
00003 Thelgath-Phalnasar
00001 Tob-Adonias
00045 Tout-Puissant
00001 Très-Fort
00091 Très-Haut
Most of these are hyphenated proper names.
Some hyphenated proper names have been translated rather than transliterated from the Hebrew.
The notable one with 3 hyphens in Isaiah 8:3 is one such example:
\v 3 et je m’approchai de la prophétesse, et elle conçut et enfanta un fils. Alors le Seigneur me dit : Donne-lui pour nom : Hâtez-vous (Hâte-toi) de saisir les dépouilles, pille(z) promptement ;
cf. Many Bibles have Maher-shalal-hash-baz
here, with the meaning of the name given in a footnote.
Aside: It goes almost without saying that the method to produce the counted words list was made much simpler by having fixed issue #4 because I did not have to treat \x27
as a special case.
The attached text file contains a tab delimited text file that counts all the words found in verse text of the VulgateGlaire.
merged.words.count.txt
The output file is automatically sorted on the words field, though the collation algorithm probably does not match that applicable for the French language.
This list is provided to assist with proof reading. It's a powerful analysis method for detecting typos and spelling mistakes.
Punctuation marks other than hyphen/minus and the right single quotation mark (used as the typographical apostrophe) were removed.
Browsing through the list, take particular note of the hapax legomena, of which there are 12564. That's a staggering 40% of the total number of listed words.
Although many words are found only once in all Bibles, some of these instances may be erroneous.