Closed DavidHaslam closed 5 years ago
From the above, here are the words that either start or end with a hyphen:
00004 -
00001 -and
00001 -churches
00001 -days
00001 -doing
00001 -go
00001 -Lord
00001 -their
00001 -up
00001 -whom
00001 -you
00001 ate-
00001 fellow-
00001 free-
00001 gave-
00001 it-
00001 like-
00001 Lord-
00001 of-
00001 speaks-
00001 stood-
00002 the-
00001 three-
00001 which-
Status: These OCR errors have been fixed in the local SFM files (unless I've inadvertently missed any).
Aside: None of the corrections required a quotation dash.
As it happens, I did miss these: (fresh results after reconverting to OSIS).
00003 -
00001 -whom
00001 Lord-
The inconsistent use of a hyphen needs checking for some words.
Example:
00001 fellowservant
00005 fellow-servant
00004 fellow-servants
Clearly, the word fellowservant
must be a mistake.
Status: Fixed hyphen in Col.1.7
And I missed a correction involving the ligature æ
:
00001 Hymenæus
00001 Hymenaeus
<verse sID="2Tim.2.17" osisID="2Tim.2.17" n="17" />and their word will eat as a gangrene: of whom are Hymenaeus and Philetus,<verse eID="2Tim.2.17" />
Status: Fixed in local SFM file.
Some word prefices seem to have been left isolated:
00003 ac
00002 ex
Found in:
<verse sID="Gal.4.29" osisID="Gal.4.29" n="29" />But as then, he that was born according to the flesh persecuted him that was born ac cording to the Spirit, so even now.<verse eID="Gal.4.29" />
<verse sID="Col.2.8" osisID="Col.2.8" n="8" />See that no one make you the victims of imposture by means of philosophy and vain deceit, according to the tradition of men, according to the rudiments of the world, and not ac cording to Christ:<verse eID="Col.2.8" />
<verse sID="Heb.4.13" osisID="Heb.4.13" n="13" />And there is no creature which is not manifest in his sight: but all things are naked, and exposed to the eyes of him to whom we must give an ac count.</p><p><verse eID="Heb.4.13" />
<verse sID="Luke.23.5" osisID="Luke.23.5" n="5" />But they became the more urgent, and said: He ex cites the people, teaching throughout the whole of Judea, beginning from Galilee to this place.<verse eID="Luke.23.5" />
<verse sID="Acts.20.35" osisID="Acts.20.35" n="35" />In all things I have taught you by ex ample, that by thus laboring, you ought to support the weak, and to remember the words of the Lord Jesus; for he himself said, It is more blessed to give, than to receive.</p><p><verse eID="Acts.20.35" />
Status: Fixed these 5 instances in the local SFM files.
There are these OCR errors:
00005 ho
located here as mistakes for he
or no
:
<verse sID="Matt.17.15" osisID="Matt.17.15" n="15" />and said: Lord, have mercy on my son; for ho is a lunatic, and suffers grievously; for he often falls into the fire, and often into the water.
<verse sID="Mark.12.1" osisID="Mark.12.1" n="1" />And ho began to speak to them in parables: A man planted a vineyard, and set a hedge around it, and digged a wine-press, and built a tower, and let it out to vine-dressers, and went into another country.
<verse sID="Mark.13.27" osisID="Mark.13.27" n="27" />And then will ho send his angels, and gather his elect from the four winds, from the most distant part of earth to the most distant part of heaven.
<verse sID="Luke.20.11" osisID="Luke.20.11" n="11" />And ho then sent another servant. But they scourged him also, and treated him shamefully, and sent him away empty-handed.
<verse sID="John.4.11" osisID="John.4.11" n="11" />The woman said to him: Sir, you have ho vessel with which you can draw, and the well is deep; whence have you that living water?
Status: Fixed these 5 instances in the local SFM files.
Again:
00001 ID
OCR mistake for in
<verse sID="John.17.23" osisID="John.17.23" n="23" />I in them, and thou ID me, that they may be made perfect in one, that the world may know that thou hast sent me, and hast loved them, as thou hast loved me.
Status: Fixed in the local SFM file.
I rather suspect that one of the apparently missing verses is due to an OCR error for a verse tag:
<verse sID="2Cor.4.17" osisID="2Cor.4.17" n="17" />For our present light affliction works out for us an eternal fullness of glory, excelling all excellence, ls while we look not at the things that are seen, but at the things that are not seen: for the things seen are temporal; but the things not seen are eternal.
<verse eID="2Cor.4.17" />
The verse tag for 18
was misread as ls
.
Status: Fixed my downstream local copy of the SFM file.
Another one:
00001 tn
Location:
<verse sID="1Tim.6.16" osisID="1Tim.6.16" n="16" />who alone has immortality, dwelling in light unapproachable, whom no man has seen, nor can see, tn whom be honor and power eternal. Amen.
Should be to
.
Status: Fixed in the local SFM file.
Another:
00001 Le
Location:
<verse sID="Luke.14.11" osisID="Luke.14.11" n="11" />For every one that exalts himself shall Le humbled; and he that humbles himself shall be exalted.
Should be be
.
Status: Fixed in the local SFM file.
The task of an archivist isn't to fix perceived errors, but to record the content as it was.
This line of replacing dashes and removing spaces and respelling unhyphenated words as hyphenated words due to consistency... We should be following the printed edition, and not going away from that. Here' I'm pointing to "removing spaces or adding dashes" for consistency across the work.
Now there's an issue where an obvious typographical error creates a meaning change. In cases like a missing "not" in the statement "do not commit adultery" it is an archivist's task to not present a flawed text. Some archivists mark the repair in some way. I'm neutral on that. I personally support that the meaningful typo should be restored to its intended reading without marking it, but I wouldn't seek to remove someone elses marks, and if I was making a single change to a work that had them, I'd follow the convention in place.
None of the issues here are meaning based that I can see, only presentational. In the form that appears here (USFM) dashes (Hyphen-Minus: Ux2D) are preferred, not publishing time presentational markup. This is driven by the convention of 'what tools will an editor use, and have ready for them.' That is, if someone opens a file and adds to it, it is likely to have dashes (Ux2D) added and not and of the others (Ux2010, Ux2012, Ux2013, Ux2014, Ux2015, Ux2027, Ux2043). Because of this, the 'source' should be keyboard friendly.
There is a best method to use "--" to indicate an endash, and "---" to indicate an emdash to the compositor.
At the point of publishing (and for the OSIS file, and for my OpenDocument and Indesign import files) I do recommend fixing the dashes into the presentation forms.
However for the USFM source, editors use the key that is present nearly universally, and compositors mostly deal with forming the dashes into their various iterations at the time of publication. I'm sure you'll see variation from this when investigating USFM. However, In cases where USFM represents 'source' meaning the culmination of an editors work, you'll see this followed more often than not. All dashes are the low ascii dash.
I do suspect many of what you've listed are variations from the printed text and do need to be fixed. But the fix should be either to the printed edition, or due to a meaningful error (what appears doesn't mean what the editor obviously meant it to mean.)
But proceed with this suggestion however you see fit.. I'm not about to overrule or undo any changes made here. :-) your work adds to this. I'm just laying down the guidance I work with.
Each correction is first checked in the PDF. If any of my initial conjectures are mistaken, so be it. It’s merely comment.
If a quotation dash is observed in the PDF, that’s what should go in the SFM.
Third party keyboard limitations are quite irrelevant.
The aim is to represent the original text accurately, not to include any kludge that was due in earlier years to the limitations of ANSI.
The digital text can make suitable use of the relevant Unicode characters.
I’m making good progress going through all these items, despite some other pleasant interruptions during my day.
I’m recording them all in git as fixes to OCR error[s].
That’s what they all are - so far at least.
.... I've been working for nearly 2 years sporadically on the 1911 bible. One of my struggles is the colon vs. semicolon. In the scans of the edition, there appears to be one or 2 peices of lead that come into play that are so ambiguous as to nearly be a 3rd mark in between a colon and semicolon. In a very few places where the 1911 OCR edition varies from the ebible KJV, I end up with 6 copies of KJV and derivatives open searching for whether the semicolon or colon has more precedent, and whether if it does... does the resulting reading change meaning at all between the two. Commas vs. periods aren't nearly so hard. but the colon vs. semicolon... especially with the lead type used... It makes for a slow process at times when that/those variant demicolons were on the top of the typesetters pile.
Hmmmm !
These are now the lowercase words detected as being "misspelt" by the DSpellCheck plugin.
abidest
accordign
acknowledgement
affusion
allegorized
amomum
antichrist
antichrists
babblings
bondmaid
builded
carest
catamites
chrysoprase
chœnices
chœnix
counsellor
crysolite
cummin
denarii
denarius
despisers
didrachma
digged
disputings
draught
drinkings
eldership
engraven
envyings
fastings
fatlings
gavest
goest
groanings
hewn
hinderance
hyacinthine
inclose
inclosed
incorruptness
ingrafted
intrust
intrusted
jesus
killest
kneeled
knewest
knowest
kumi
lamah
lamma
lictors
lovefeasts
lovest
luke
mightest
mockings
nard
nought
offscouring
offsprings
oldwomanish
opposers
pre
predestinated
prophesyings
remainest
revellings
sabachthani
sabbaths
saltless
sardius
sardonyx
sawn
scourgings
seekest
seest
shaven
sheepgate
shorn
shouldst
soberminded
speakest
spearmen
stealers
stonest
sulphurous
sycamine
talkest
tetrads
tetrarch
transgresssor
uncircumcision
uncomely
uncondemned
undefiled
unthankful
unvailed
vail
vailed
wellpleasing
whosever
willest
wranglings
yod
That's 106 hits out of a total of 414 words, the rest being words with an uppercase initial letter.
A few of these are worth looking into more closely.
NB. DSpellCheck counts a hyphen as a non-word character, so in effect it splits hyphenated words.
This pair is worth comparing:
00001 well-pleasing
00001 wellpleasing
Status: Changed the latter to well-pleasing
.
The hyphen just happened to be at the word-wrap location.
The 306 proper names thus detected are:
Abaddon Abia Abiathar Abijah Abiud Achaia Achaicus Achim Adramyttium Agabus Ahaz Akeldama Alphæus Aminidab Amon Amphipolis Amplias Andronicus Annas Antipatris Apollonia Apollyon Appelles Apphia Appii Aram Archelaus Archippus Areopagite Aretas Arimathea Aristobulus Arphaxad Asiarchs Assos Asyncritus Attalia Azor Azotus Balaam Balak Barachiah Barak Barsabas Bartimæus Beelzebul Belial Beor Berea Bethphage Bethsaida Bithynia Blastus Boanerges Boaz Cana Capernaum Cappadocia Cenchrea Cephas Chaldeans Chanaan Charran Chios Chorazin Chuza Cilicia Clauda Cleopas Cnidus Colosse Colossians Corban Crescens Cretes Crispus Cyrene Cyrenian Cyrenians Cyrenius Cæsar Cæsarea Dalmanutha Damascenes Decapolis Demas Derbe Didymus Dionysius Dioscuri Diotrephes Distrcit Eber Elamites Eliakim Eliezer Eliud Elmodam Eloi Elymas Emmaus Emmor Epaphras Epaphroditus Epenetus Ephphatha Esli Esrom Eubulus Euodia Euroclydon Eutychus Festus Fortunatus Gabbatha Gadarenes Gaius Gallio Gennesaret Harrodsburg Heli Hermogenes Herodians Herodias Herodion Hezron Hierapolis Hymenæus Iconium Idumea Illyricum Issachar Iturea Jairus Jambres Jannes Jeconiah Jehoram Jehosaphat Joannah Jonan Joppa Jorim Joses Jotham Kedron Kenan Kis Korah Kosam Lamech Laodicea Laodiceans Lasea Lebbæus Levite Levites Levitical Lycaonia Lycaonian Lycia Lydda Lysanias Lysias Lystra Maath Magdala Mahalaleel Mainan Malchi Malchus Manaen Mattatha Mattathiah Matthan Matthat Melchisedec Meleah Methusalah Midian Miletus Mitylene Mnason Mysia Naaman Naggæ Nahor Nahshon Nain Neapolis Nereus Neri Nicanor Nicolaitanes Nicopolis Ninevites Nymphas Olivet Olympas Onesimus Onesiphorus Pamphylia Paphos Parmenas Parthians Patara Patmos Patrobas Peleg Perga Pergamos Phanuel Phares Pharoah Phenicia Philemon Philetus Philippi Philologus Phlegon Phrygia Phygellus Phœbe Pisidia Pontius Pontus Portius Prochorus Ptolemais Publius Pudens Puteoli Quartus Rabab Rabboni Rachab Rahab Ramah Rebecca Rehoboam Remphan Reu Rhegium Rhesa Sadducees Sadok Salah Salmone Samos Samothracia Sanhedrim Sardis Sarepta Saron Sceva Scythian Secundus Seleucia Sergius Serug Shealtiel Shimoi Sidon Sidonians Siloam Sina Sopater Sosipater Sosthenes Stachys Stephanas Sychar Sychem Sylvanus Syntyche Syrophenician Talitha Tartarus Terah Tertius Tertullus Thaddæus Thamar Theophilus Thessalonians Thessalonica Theudas Thyatira Tiberias Timæus Trachonitis Troas Trogyllium Trophimus Tryphena Tryphosa Tychicus Tyrannus Tyre Tyrians Uzziah Zabulon Zacchæus Zelotes Zenas Zerubbabel
accordign
is probably a typo for according
but I must check the PDF in case it's a printer error.
Status: Just a typo. Fixed in my improvements
branch.
Completed the current round of OCR fixes in my improvements
branch.
I hope to pull and merge shortly.
Here's an updated list of words detected by DSpellCheck, this time with a sort applied.
Æneas Ænon Abaddon Abia Abiathar Abijah Abiud Achaia Achaicus Achim Adramyttium Agabus Ahaz Akeldama Alphæus Aminidab Amon Amphipolis Amplias Andronicus Annas Antipatris Apollonia Apollyon Appelles Apphia Appii Aram Archelaus Archippus Areopagite Aretas Arimathea Aristobulus Arphaxad Asiarchs Assos Asyncritus Attalia Azor Azotus Balaam Balak Barachiah Barak Barsabas Bartimæus Beelzebul Belial Beor Berea Bethphage Bethsaida Bithynia Blastus Boanerges Boaz Cæsar Cæsarea Cana Capernaum Cappadocia Cenchrea Cephas Chaldeans Chanaan Charran Chios Chorazin Chuza Cilicia Clauda Cleopas Cnidus Colosse Colossians Corban Crescens Cretes Crispus Cyrene Cyrenian Cyrenians Cyrenius Dalmanutha Damascenes Decapolis Demas Derbe Didymus Dionysius Dioscuri Diotrephes Distrcit Eber Elamites Eliakim Eliezer Eliud Elmodam Eloi Elymas Emmaus Emmor Epaphras Epaphroditus Epenetus Ephphatha Esli Esrom Eubulus Euodia Euroclydon Eutychus Festus Fortunatus Gabbatha Gadarenes Gaius Gallio Gennesaret Harrodsburg Heli Hermogenes Herodians Herodias Herodion Hezron Hierapolis Hymenæus Iconium Idumea Illyricum Issachar Iturea Jairus Jambres Jannes Jeconiah Jehoram Jehosaphat Joannah Jonan Joppa Jorim Joses Jotham Kedron Kenan Kis Korah Kosam Lamech Laodicea Laodiceans Lasea Lebbæus Levite Levites Levitical Lycaonia Lycaonian Lycia Lydda Lysanias Lysias Lystra Maath Magdala Mahalaleel Mainan Malchi Malchus Manaen Mattatha Mattathiah Matthan Matthat Melchisedec Meleah Methusalah Midian Miletus Mitylene Mnason Mysia Naaman Naggæ Nahor Nahshon Nain Neapolis Nereus Neri Nicanor Nicolaitanes Nicopolis Ninevites Nymphas Olivet Olympas Onesimus Onesiphorus Pamphylia Paphos Parmenas Parthians Patara Patmos Patrobas Peleg Perga Pergamos Phœbe Phanuel Phares Pharoah Phenicia Philemon Philetus Philippi Philologus Phlegon Phrygia Phygellus Pisidia Pontius Pontus Portius Prochorus Ptolemais Publius Pudens Puteoli Quartus Rabab Rabboni Rachab Rahab Ramah Rebecca Rehoboam Remphan Reu Rhegium Rhesa Sadducees Sadok Salah Salmone Samos Samothracia Sanhedrim Sardis Sarepta Saron Sceva Scythian Secundus Seleucia Sergius Serug Shealtiel Shimoi Sidon Sidonians Siloam Sina Sopater Sosipater Sosthenes Stachys Stephanas Sychar Sychem Sylvanus Syntyche Syrophenician Talitha Tartarus Terah Tertius Tertullus Thaddæus Thamar Theophilus Thessalonians Thessalonica Theudas Thyatira Tiberias Timæus Trachonitis Troas Trogyllium Trophimus Tryphena Tryphosa Tychicus Tyrannus Tyre Tyrians Uzziah Zabulon Zacchæus Zelotes Zenas Zerubbabel
abidest acknowledgement affusion allegorized amomum antichrist antichrists babblings bondmaid builded carest catamites chœnices chœnix chrysoprase counsellor crysolite cummin denarii denarius despisers didrachma digged disputings draught drinkings eldership engraven envyings fastings fatlings gavest goest groanings hewn hinderance hyacinthine inclose inclosed incorruptness ingrafted intrust intrusted jesus killest kneeled knewest knowest kumi lamah lamma lictors lovefeasts lovest luke mightest mockings nard nought offscouring offsprings oldwomanish opposers pre predestinated prophesyings remainest revellings sabachthani sabbaths saltless sardius sardonyx sawn scourgings seekest seest shaven sheepgate shorn shouldst soberminded speakest spearmen stealers stonest sulphurous sycamine talkest tetrads tetrarch transgresssor uncircumcision uncomely uncondemned undefiled unthankful unvailed vail vailed whosever willest wranglings yod
The screenshot illustrates another analysis I thought of doing:
https://www.dropbox.com/s/w8iast0anrxeklm/Screenshot%202019-02-13%2019.35.25.png?dl=0
Changes merged.
I have generated a counted words list for the Anderson NT and several more errors have come to light.
Here's a list of all the places where a hyphen occurs:
The words that start or end with a hyphen need checking & correcting, as do a few words that are not spelled correctly.