BibleCorps / ENG-B1-Anderson1864-pd-USFM

Henry T. Anderson's 1864 "Civil War" New Testament
https://archive.org/details/thenewtestament00andeuoft
4 stars 2 forks source link

Counted words list - looking for more outstanding corrections #15

Closed DavidHaslam closed 5 years ago

DavidHaslam commented 5 years ago

I have generated a counted words list for the Anderson NT and several more errors have come to light.

Here's a list of all the places where a hyphen occurs:

00004   -
00001   -and
00001   -churches
00001   -days
00001   -doing
00001   -go
00001   -Lord
00001   -their
00001   -up
00001   -whom
00001   -you
00001   ate-
00001   banquet-room
00001   Bar-jesus
00001   bed-chamber
00003   bride-chamber
00001   bride-groom
00001   burial-place
00003   burnt-offerings
00001   co-workers
00002   corner-stone
00001   covenant-breakers
00001   cross-ways
00003   custom-house
00003   daughter-in-law
00003   daughter-in-law
00002   day-time
00003   door-keeper
00002   double-minded
00001   double-tongued
00002   dwelling-place
00001   eighty-four
00002   empty-handed
00003   evil-doer
00006   evil-doers
00001   evil-speakers
00001   eye-salve
00002   eye-service
00002   eye-witnesses
00001   faint-hearted
00001   father-in-law
00001   father-in-law
00001   fault-finders
00001   fellow-
00001   fellow-citizens
00001   fellow-disciples
00001   fellow-elder
00001   fellow-heirs
00001   fellow-helpers
00001   fellow-insurgents
00001   fellow-laborer
00004   fellow-laborers
00002   fellow-prisoner
00001   fellow-prisoners
00005   fellow-servant
00004   fellow-servants
00002   fellow-soldier
00001   fellow-traveler
00001   fellow-travelers
00001   fellow-worker
00001   fellow-workers
00002   fellow-workman
00001   fiery-red
00001   fifty-three
00016   fig-tree
00001   first-begotten
00009   first-born
00001   first-fruit
00004   first-fruits
00004   forty-four
00002   forty-two
00003   four-footed
00001   free-
00001   free-will
00001   free-woman
00001   full-grown
00002   garden-plants
00001   gave-
00001   goat-skins
00001   good-will
00001   grand-children
00001   grave-clothes
00003   hell-fire
00001   high-minded
00003   house-top
00002   house-tops
00001   humble-minded
00002   hundred-fold
00001   hyssop-stalk
00001   it-
00001   jasper-stone
00001   joint-heirs
00010   judgment-seat
00001   judgment-seats
00001   kind-hearted
00004   lamp-stand
00001   law-giver
00001   law-suits
00001   life-giving
00001   like-
00012   long-suffering
00001   Lord-
00001   luke-warm
00006   maid-servant
00001   maid-servants
00001   man-slayers
00001   market-place
00001   market-places
00002   marriage-supper
00001   master-builder
00001   men-servants
00001   men-stealers
00001   mercy-seat
00003   mid-heaven
00004   money-changers
00001   money-loving
00001   moth-eaten
00006   mother-in-law
00006   mother-in-law
00001   mustard-seed
00001   new-born
00004   ninety-nine
00001   north-west
00001   of-
00001   olive-tree
00001   only-begotten
00001   palm-trees
00001   party-spirit
00001   pre-eminence
00001   preparation-day
00001   race-course
00001   re-examined
00022   sabbath-day
00001   Sabbath-day
00003   sabbath-days
00003   Sabbath-days
00001   sabbath-state
00001   sardine-stone
00001   sea-coast
00001   sea-shore
00001   self-condemned
00001   self-control
00001   self-willed
00001   seventy-five
00001   seventy-six
00001   sheep-skins
00003   sin-offering
00001   sixty-six
00005   sober-minded
00001   south-west
00001   speaks-
00001   stiff-necked
00001   stood-
00005   stumbling-block
00002   swathing-clothes
00001   sycamine-tree
00001   sycamore-tree
00001   tent-makers
00001   that-the
00002   the-
00001   thirty-eight
00001   thorn-branches
00002   thrashing-floor
00001   three-
00013   to-day
00006   To-day
00009   to-morrow
00001   To-morrow
00001   town-clerk
00001   tribute-money
00001   twenty-five
00006   twenty-four
00001   twenty-one
00001   twenty-three
00004   two-edged
00001   vain-glorious
00002   vine-dresser
00016   vine-dressers
00001   water-pot
00002   water-pots
00001   weak-minded
00002   wedding-robe
00001   well-doing
00001   well-known
00001   well-pleasing
00001   which-
00001   will-worship
00006   wine-press
00001   winnowing-shovel
00001   yoke-fellow

The words that start or end with a hyphen need checking & correcting, as do a few words that are not spelled correctly.

DavidHaslam commented 5 years ago

From the above, here are the words that either start or end with a hyphen:

00004   -
00001   -and
00001   -churches
00001   -days
00001   -doing
00001   -go
00001   -Lord
00001   -their
00001   -up
00001   -whom
00001   -you
00001   ate-
00001   fellow-
00001   free-
00001   gave-
00001   it-
00001   like-
00001   Lord-
00001   of-
00001   speaks-
00001   stood-
00002   the-
00001   three-
00001   which-

Status: These OCR errors have been fixed in the local SFM files (unless I've inadvertently missed any).

Aside: None of the corrections required a quotation dash.

As it happens, I did miss these: (fresh results after reconverting to OSIS).

00003   -
00001   -whom
00001   Lord-
DavidHaslam commented 5 years ago

The inconsistent use of a hyphen needs checking for some words.

Example:

00001   fellowservant
00005   fellow-servant
00004   fellow-servants

Clearly, the word fellowservant must be a mistake.

Status: Fixed hyphen in Col.1.7

DavidHaslam commented 5 years ago

And I missed a correction involving the ligature æ:

00001   Hymenæus
00001   Hymenaeus
<verse sID="2Tim.2.17" osisID="2Tim.2.17" n="17" />and their word will eat as a gangrene: of whom are Hymenaeus and Philetus,<verse eID="2Tim.2.17" />

Status: Fixed in local SFM file.

DavidHaslam commented 5 years ago

Some word prefices seem to have been left isolated:

00003   ac
00002   ex

Found in:

<verse sID="Gal.4.29" osisID="Gal.4.29" n="29" />But as then, he that was born according to the flesh persecuted him that was born ac cording to the Spirit, so even now.<verse eID="Gal.4.29" />
<verse sID="Col.2.8" osisID="Col.2.8" n="8" />See that no one make you the victims of imposture by means of philosophy and vain deceit, according to the tradition of men, according to the rudiments of the world, and not ac cording to Christ:<verse eID="Col.2.8" />
<verse sID="Heb.4.13" osisID="Heb.4.13" n="13" />And there is no creature which is not manifest in his sight: but all things are naked, and exposed to the eyes of him to whom we must give an ac count.</p><p><verse eID="Heb.4.13" />
<verse sID="Luke.23.5" osisID="Luke.23.5" n="5" />But they became the more urgent, and said: He ex cites the people, teaching throughout the whole of Judea, beginning from Galilee to this place.<verse eID="Luke.23.5" />
<verse sID="Acts.20.35" osisID="Acts.20.35" n="35" />In all things I have taught you by ex ample, that by thus laboring, you ought to support the weak, and to remember the words of the Lord Jesus; for he himself said, It is more blessed to give, than to receive.</p><p><verse eID="Acts.20.35" />

Status: Fixed these 5 instances in the local SFM files.

DavidHaslam commented 5 years ago

There are these OCR errors:

00005   ho

located here as mistakes for he or no :

<verse sID="Matt.17.15" osisID="Matt.17.15" n="15" />and said: Lord, have mercy on my son; for ho is a lunatic, and suffers grievously; for he often falls into the fire, and often into the water.
<verse sID="Mark.12.1" osisID="Mark.12.1" n="1" />And ho began to speak to them in parables: A man planted a vineyard, and set a hedge around it, and digged a wine-press, and built a tower, and let it out to vine-dressers, and went into another country.
<verse sID="Mark.13.27" osisID="Mark.13.27" n="27" />And then will ho send his angels, and gather his elect from the four winds, from the most distant part of earth to the most distant part of heaven.
<verse sID="Luke.20.11" osisID="Luke.20.11" n="11" />And ho then sent another servant. But they scourged him also, and treated him shamefully, and sent him away empty-handed.
<verse sID="John.4.11" osisID="John.4.11" n="11" />The woman said to him: Sir, you have ho vessel with which you can draw, and the well is deep; whence have you that living water?

Status: Fixed these 5 instances in the local SFM files.

DavidHaslam commented 5 years ago

Again:

00001   ID

OCR mistake for in

<verse sID="John.17.23" osisID="John.17.23" n="23" />I in them, and thou ID me, that they may be made perfect in one, that the world may know that thou hast sent me, and hast loved them, as thou hast loved me.

Status: Fixed in the local SFM file.

DavidHaslam commented 5 years ago

I rather suspect that one of the apparently missing verses is due to an OCR error for a verse tag:

<verse sID="2Cor.4.17" osisID="2Cor.4.17" n="17" />For our present light affliction works out for us an eternal fullness of glory, excelling all excellence, ls while we look not at the things that are seen, but at the things that are not seen: for the things seen are temporal; but the things not seen are eternal.
<verse eID="2Cor.4.17" />

The verse tag for 18 was misread as ls.

Status: Fixed my downstream local copy of the SFM file.

DavidHaslam commented 5 years ago

Another one:

00001   tn

Location:

<verse sID="1Tim.6.16" osisID="1Tim.6.16" n="16" />who alone has immortality, dwelling in light unapproachable, whom no man has seen, nor can see, tn whom be honor and power eternal. Amen.

Should be to.

Status: Fixed in the local SFM file.

DavidHaslam commented 5 years ago

Another:

00001   Le

Location:

<verse sID="Luke.14.11" osisID="Luke.14.11" n="11" />For every one that exalts himself shall Le humbled; and he that humbles himself shall be exalted.

Should be be.

Status: Fixed in the local SFM file.

cmahte commented 5 years ago

The task of an archivist isn't to fix perceived errors, but to record the content as it was.

This line of replacing dashes and removing spaces and respelling unhyphenated words as hyphenated words due to consistency... We should be following the printed edition, and not going away from that. Here' I'm pointing to "removing spaces or adding dashes" for consistency across the work.

Now there's an issue where an obvious typographical error creates a meaning change. In cases like a missing "not" in the statement "do not commit adultery" it is an archivist's task to not present a flawed text. Some archivists mark the repair in some way. I'm neutral on that. I personally support that the meaningful typo should be restored to its intended reading without marking it, but I wouldn't seek to remove someone elses marks, and if I was making a single change to a work that had them, I'd follow the convention in place.

None of the issues here are meaning based that I can see, only presentational. In the form that appears here (USFM) dashes (Hyphen-Minus: Ux2D) are preferred, not publishing time presentational markup. This is driven by the convention of 'what tools will an editor use, and have ready for them.' That is, if someone opens a file and adds to it, it is likely to have dashes (Ux2D) added and not and of the others (Ux2010, Ux2012, Ux2013, Ux2014, Ux2015, Ux2027, Ux2043). Because of this, the 'source' should be keyboard friendly.

There is a best method to use "--" to indicate an endash, and "---" to indicate an emdash to the compositor.

At the point of publishing (and for the OSIS file, and for my OpenDocument and Indesign import files) I do recommend fixing the dashes into the presentation forms.

However for the USFM source, editors use the key that is present nearly universally, and compositors mostly deal with forming the dashes into their various iterations at the time of publication. I'm sure you'll see variation from this when investigating USFM. However, In cases where USFM represents 'source' meaning the culmination of an editors work, you'll see this followed more often than not. All dashes are the low ascii dash.

cmahte commented 5 years ago

I do suspect many of what you've listed are variations from the printed text and do need to be fixed. But the fix should be either to the printed edition, or due to a meaningful error (what appears doesn't mean what the editor obviously meant it to mean.)

But proceed with this suggestion however you see fit.. I'm not about to overrule or undo any changes made here. :-) your work adds to this. I'm just laying down the guidance I work with.

DavidHaslam commented 5 years ago

Each correction is first checked in the PDF. If any of my initial conjectures are mistaken, so be it. It’s merely comment.

If a quotation dash is observed in the PDF, that’s what should go in the SFM.

Third party keyboard limitations are quite irrelevant.

The aim is to represent the original text accurately, not to include any kludge that was due in earlier years to the limitations of ANSI.

The digital text can make suitable use of the relevant Unicode characters.

DavidHaslam commented 5 years ago

I’m making good progress going through all these items, despite some other pleasant interruptions during my day.

DavidHaslam commented 5 years ago

I’m recording them all in git as fixes to OCR error[s].

That’s what they all are - so far at least.

cmahte commented 5 years ago

.... I've been working for nearly 2 years sporadically on the 1911 bible. One of my struggles is the colon vs. semicolon. In the scans of the edition, there appears to be one or 2 peices of lead that come into play that are so ambiguous as to nearly be a 3rd mark in between a colon and semicolon. In a very few places where the 1911 OCR edition varies from the ebible KJV, I end up with 6 copies of KJV and derivatives open searching for whether the semicolon or colon has more precedent, and whether if it does... does the resulting reading change meaning at all between the two. Commas vs. periods aren't nearly so hard. but the colon vs. semicolon... especially with the lead type used... It makes for a slow process at times when that/those variant demicolons were on the top of the typesetters pile.

DavidHaslam commented 5 years ago

Hmmmm !

DavidHaslam commented 5 years ago

These are now the lowercase words detected as being "misspelt" by the DSpellCheck plugin.

abidest
accordign
acknowledgement
affusion
allegorized
amomum
antichrist
antichrists
babblings
bondmaid
builded
carest
catamites
chrysoprase
chœnices
chœnix
counsellor
crysolite
cummin
denarii
denarius
despisers
didrachma
digged
disputings
draught
drinkings
eldership
engraven
envyings
fastings
fatlings
gavest
goest
groanings
hewn
hinderance
hyacinthine
inclose
inclosed
incorruptness
ingrafted
intrust
intrusted
jesus
killest
kneeled
knewest
knowest
kumi
lamah
lamma
lictors
lovefeasts
lovest
luke
mightest
mockings
nard
nought
offscouring
offsprings
oldwomanish
opposers
pre
predestinated
prophesyings
remainest
revellings
sabachthani
sabbaths
saltless
sardius
sardonyx
sawn
scourgings
seekest
seest
shaven
sheepgate
shorn
shouldst
soberminded
speakest
spearmen
stealers
stonest
sulphurous
sycamine
talkest
tetrads
tetrarch
transgresssor
uncircumcision
uncomely
uncondemned
undefiled
unthankful
unvailed
vail
vailed
wellpleasing
whosever
willest
wranglings
yod

That's 106 hits out of a total of 414 words, the rest being words with an uppercase initial letter.

A few of these are worth looking into more closely.

NB. DSpellCheck counts a hyphen as a non-word character, so in effect it splits hyphenated words.

DavidHaslam commented 5 years ago

This pair is worth comparing:

00001   well-pleasing
00001   wellpleasing

Status: Changed the latter to well-pleasing.

The hyphen just happened to be at the word-wrap location.

DavidHaslam commented 5 years ago

The 306 proper names thus detected are:

Abaddon Abia Abiathar Abijah Abiud Achaia Achaicus Achim Adramyttium Agabus Ahaz Akeldama Alphæus Aminidab Amon Amphipolis Amplias Andronicus Annas Antipatris Apollonia Apollyon Appelles Apphia Appii Aram Archelaus Archippus Areopagite Aretas Arimathea Aristobulus Arphaxad Asiarchs Assos Asyncritus Attalia Azor Azotus Balaam Balak Barachiah Barak Barsabas Bartimæus Beelzebul Belial Beor Berea Bethphage Bethsaida Bithynia Blastus Boanerges Boaz Cana Capernaum Cappadocia Cenchrea Cephas Chaldeans Chanaan Charran Chios Chorazin Chuza Cilicia Clauda Cleopas Cnidus Colosse Colossians Corban Crescens Cretes Crispus Cyrene Cyrenian Cyrenians Cyrenius Cæsar Cæsarea Dalmanutha Damascenes Decapolis Demas Derbe Didymus Dionysius Dioscuri Diotrephes Distrcit Eber Elamites Eliakim Eliezer Eliud Elmodam Eloi Elymas Emmaus Emmor Epaphras Epaphroditus Epenetus Ephphatha Esli Esrom Eubulus Euodia Euroclydon Eutychus Festus Fortunatus Gabbatha Gadarenes Gaius Gallio Gennesaret Harrodsburg Heli Hermogenes Herodians Herodias Herodion Hezron Hierapolis Hymenæus Iconium Idumea Illyricum Issachar Iturea Jairus Jambres Jannes Jeconiah Jehoram Jehosaphat Joannah Jonan Joppa Jorim Joses Jotham Kedron Kenan Kis Korah Kosam Lamech Laodicea Laodiceans Lasea Lebbæus Levite Levites Levitical Lycaonia Lycaonian Lycia Lydda Lysanias Lysias Lystra Maath Magdala Mahalaleel Mainan Malchi Malchus Manaen Mattatha Mattathiah Matthan Matthat Melchisedec Meleah Methusalah Midian Miletus Mitylene Mnason Mysia Naaman Naggæ Nahor Nahshon Nain Neapolis Nereus Neri Nicanor Nicolaitanes Nicopolis Ninevites Nymphas Olivet Olympas Onesimus Onesiphorus Pamphylia Paphos Parmenas Parthians Patara Patmos Patrobas Peleg Perga Pergamos Phanuel Phares Pharoah Phenicia Philemon Philetus Philippi Philologus Phlegon Phrygia Phygellus Phœbe Pisidia Pontius Pontus Portius Prochorus Ptolemais Publius Pudens Puteoli Quartus Rabab Rabboni Rachab Rahab Ramah Rebecca Rehoboam Remphan Reu Rhegium Rhesa Sadducees Sadok Salah Salmone Samos Samothracia Sanhedrim Sardis Sarepta Saron Sceva Scythian Secundus Seleucia Sergius Serug Shealtiel Shimoi Sidon Sidonians Siloam Sina Sopater Sosipater Sosthenes Stachys Stephanas Sychar Sychem Sylvanus Syntyche Syrophenician Talitha Tartarus Terah Tertius Tertullus Thaddæus Thamar Theophilus Thessalonians Thessalonica Theudas Thyatira Tiberias Timæus Trachonitis Troas Trogyllium Trophimus Tryphena Tryphosa Tychicus Tyrannus Tyre Tyrians Uzziah Zabulon Zacchæus Zelotes Zenas Zerubbabel
DavidHaslam commented 5 years ago

accordign is probably a typo for according but I must check the PDF in case it's a printer error.

Status: Just a typo. Fixed in my improvements branch.

DavidHaslam commented 5 years ago

Completed the current round of OCR fixes in my improvements branch.

I hope to pull and merge shortly.

Here's an updated list of words detected by DSpellCheck, this time with a sort applied.

Æneas Ænon Abaddon Abia Abiathar Abijah Abiud Achaia Achaicus Achim Adramyttium Agabus Ahaz Akeldama Alphæus Aminidab Amon Amphipolis Amplias Andronicus Annas Antipatris Apollonia Apollyon Appelles Apphia Appii Aram Archelaus Archippus Areopagite Aretas Arimathea Aristobulus Arphaxad Asiarchs Assos Asyncritus Attalia Azor Azotus Balaam Balak Barachiah Barak Barsabas Bartimæus Beelzebul Belial Beor Berea Bethphage Bethsaida Bithynia Blastus Boanerges Boaz Cæsar Cæsarea Cana Capernaum Cappadocia Cenchrea Cephas Chaldeans Chanaan Charran Chios Chorazin Chuza Cilicia Clauda Cleopas Cnidus Colosse Colossians Corban Crescens Cretes Crispus Cyrene Cyrenian Cyrenians Cyrenius Dalmanutha Damascenes Decapolis Demas Derbe Didymus Dionysius Dioscuri Diotrephes Distrcit Eber Elamites Eliakim Eliezer Eliud Elmodam Eloi Elymas Emmaus Emmor Epaphras Epaphroditus Epenetus Ephphatha Esli Esrom Eubulus Euodia Euroclydon Eutychus Festus Fortunatus Gabbatha Gadarenes Gaius Gallio Gennesaret Harrodsburg Heli Hermogenes Herodians Herodias Herodion Hezron Hierapolis Hymenæus Iconium Idumea Illyricum Issachar Iturea Jairus Jambres Jannes Jeconiah Jehoram Jehosaphat Joannah Jonan Joppa Jorim Joses Jotham Kedron Kenan Kis Korah Kosam Lamech Laodicea Laodiceans Lasea Lebbæus Levite Levites Levitical Lycaonia Lycaonian Lycia Lydda Lysanias Lysias Lystra Maath Magdala Mahalaleel Mainan Malchi Malchus Manaen Mattatha Mattathiah Matthan Matthat Melchisedec Meleah Methusalah Midian Miletus Mitylene Mnason Mysia Naaman Naggæ Nahor Nahshon Nain Neapolis Nereus Neri Nicanor Nicolaitanes Nicopolis Ninevites Nymphas Olivet Olympas Onesimus Onesiphorus Pamphylia Paphos Parmenas Parthians Patara Patmos Patrobas Peleg Perga Pergamos Phœbe Phanuel Phares Pharoah Phenicia Philemon Philetus Philippi Philologus Phlegon Phrygia Phygellus Pisidia Pontius Pontus Portius Prochorus Ptolemais Publius Pudens Puteoli Quartus Rabab Rabboni Rachab Rahab Ramah Rebecca Rehoboam Remphan Reu Rhegium Rhesa Sadducees Sadok Salah Salmone Samos Samothracia Sanhedrim Sardis Sarepta Saron Sceva Scythian Secundus Seleucia Sergius Serug Shealtiel Shimoi Sidon Sidonians Siloam Sina Sopater Sosipater Sosthenes Stachys Stephanas Sychar Sychem Sylvanus Syntyche Syrophenician Talitha Tartarus Terah Tertius Tertullus Thaddæus Thamar Theophilus Thessalonians Thessalonica Theudas Thyatira Tiberias Timæus Trachonitis Troas Trogyllium Trophimus Tryphena Tryphosa Tychicus Tyrannus Tyre Tyrians Uzziah Zabulon Zacchæus Zelotes Zenas Zerubbabel

abidest acknowledgement affusion allegorized amomum antichrist antichrists babblings bondmaid builded carest catamites chœnices chœnix chrysoprase counsellor crysolite cummin denarii denarius despisers didrachma digged disputings draught drinkings eldership engraven envyings fastings fatlings gavest goest groanings hewn hinderance hyacinthine inclose inclosed incorruptness ingrafted intrust intrusted jesus killest kneeled knewest knowest kumi lamah lamma lictors lovefeasts lovest luke mightest mockings nard nought offscouring offsprings oldwomanish opposers pre predestinated prophesyings remainest revellings sabachthani sabbaths saltless sardius sardonyx sawn scourgings seekest seest shaven sheepgate shorn shouldst soberminded speakest spearmen stealers stonest sulphurous sycamine talkest tetrads tetrarch transgresssor uncircumcision uncomely uncondemned undefiled unthankful unvailed vail vailed whosever willest wranglings yod 
DavidHaslam commented 5 years ago

The screenshot illustrates another analysis I thought of doing:

https://www.dropbox.com/s/w8iast0anrxeklm/Screenshot%202019-02-13%2019.35.25.png?dl=0

DavidHaslam commented 5 years ago

Changes merged.