Character counts in the concatenated USFM files for the 73 Bible books #17

opened 5 years ago

DavidHaslam commented 5 years ago

The attached tab delimited text file may be a useful analysis:


Observe the difference in counts for characters that are usually in pairs.

U+0028  (   5,803   LEFT PARENTHESIS
U+0029  )   5,800   RIGHT PARENTHESIS



This indicates that there may be some unpaired characters, which is often worth checking.

The right single quotation mark is also used as the typographical apostrophe, which helps explain the large difference observed.

DavidHaslam commented 5 years ago

2 out of 9 instances of the NO BREAK SPACE are artefacts and may be replaced by a normal space.

They are found in these two verses:

\v 2 ¶ But the earth was empty and unoccupied, and darknesses were over the face of the abyss; and so the Spirit of God was brought over the waters.\f + \fr 1:2 \ft After earth was created, it was empty and unoccupied. Darkness is plural in the Latin. This could symbolize fallen angels, with the abyss symbolizing Hell. The word ‘darknesses’ can also refer to the absences of so many good things, so that God had to continue creating. The Spirit of God was brought or was carried over the waters, passive tense.\fl (Conte)\f*
\v 4 And God saw the light, that it was good; and so he divided the light from the darknesses.\f + \fr 1:4 \ft God divided light from darkness. He also divided Heaven from Hell, once Hell was created (or became necessary due to the angels that fell from grace). Notice that God chooses to create Heaven and Earth (Universe), but Hell comes about as a result of sin. God creates Good, but Evil comes about because of sin. God does not directly create evil or darkness.\fl (Conte)\f*

The other 7 instances are rightly used between single and double quotation marks.

By contrast, there are 7 places where an ordinary space is used between double and single left qm.

\v 17 And he told Joseph that he should command his brothers, saying: “ ‘Burden your beasts, and go into the land of Canaan,
\v 3 “ ‘Prepare the heavy and the light shield, and advance to war!\f + \fr 46:3 \ft In other words, prepare the heavy and light weapons of war.\fl (Conte)\f*
\v 1 ¶ “ ‘For this reason, the Lord our God has fulfilled his word, which he has spoken to us, and to our judges, who have judged Israel, and to our kings, and to our leaders, and to all Israel and Judah.
\v 1 ¶ “ ‘And now, O Lord Almighty, the God of Israel, the soul in anguish and the troubled spirit cry out to you.
\v 1 ¶ “ ‘This is the book of the commandments of God and of the law, which exists in eternity. All those who keep it will attain to life, but those who have forsaken it, to death.
\v 1 ¶ “ ‘Take off, O Jerusalem, the garment of your sorrow and troubles, and put on your beauty and the honor of that eternal glory, which you have from God.
\v 37 Jesus said to him: “ ‘You shall love the Lord your God from all your heart, and with all your soul and with all your mind.’

And there are 205 places where an ordinary space is used between single and double right qm. That's too many to list here. Search for regexp \x{2019} \x{201D}

These inconsistencies should be fixed.