SubtitleEdit / subtitleedit

the subtitle editor :)
http://www.nikse.dk/SubtitleEdit/Help
GNU General Public License v3.0
8.11k stars 880 forks source link

A list of lines that should be removed/corrected from eng_OCRFixReplaceList.xml. #3319

Closed Ding-adong closed 5 years ago

Ding-adong commented 5 years ago
<LinePart from="-| " to="- I " /> Dup line
<Word from="G0" to="go" /> lower case
<Word from="I\/I" to="M" /> Dup line
<Word from="It'syour" to="It's your" /> Dup line
<Beginning from="IVIa" to="Ma" /> Dup line
<Word from="IVIe" to="Me" /> Dup line
<Beginning from="IVIu" to="Mu" /> Dup line
<Word from="IVIy" to="My" /> Dup line
<Word from="lt'll" to="It'll" /> Dup line
<Line from="SQMEWH ERE ELSE" to="SOMEWHERE ELSE" /> Dup line
<Word from="Nllmenoreans" to="Numenoreans" /> remove too rare to bother.

Remove all below and use this regex <RegEx find="(?:ii\/I|ii\/l|IVI|IVi|IVl|IV\||I\\\/I|I\\\/l|L\\\/I|l\\\/I|l\\\/l|li\/l|ll\/I|Ll\/l|ll\/l|lVI|LVl|lVl|lvl|lx\/I|Nl)(\w+)" replaceWith="M$1" />

<Word from="I\/Ian" to="Man" />
<Word from="I\/Iathies" to="Mathies" />
<Word from="I\/Ie" to="Me" />
<Word from="I\/Iommy" to="Mommy" />
<Word from="I\/Ir" to="Mr" />
<Word from="I\/Ir." to="Mr." />
<Word from="I\/ly" to="My" />
<Word from="ii/Iary" to="Mary" />
<Word from="ii/Ir" to="Mr" />
<Word from="ii/Ir." to="Mr." />
<Word from="ii/love" to="Move" />
<Word from="IVIAN" to="MAN" />
<Word from="IVIan" to="Man" />
<Word from="IVIarch" to="March" />
<Word from="IVIarci's" to="Marci's" />
<Word from="IVIarko" to="Marko" />
<Word from="IViiff/in's" to="Mifflin's" />
<Word from="IVIine's" to="Mine's" />
<Word from="IVImm" to="Mmm" />
<Word from="IVIoney" to="Money" />
<Word from="IVIr." to="Mr." />
<Word from="IVIrs" to="Mrs" />
<Word from="IVIuch" to="Much" />
<Word from="IVIust" to="Must" />
<Word from="IVlacArthur" to="MacArthur" />
<Word from="IVlacArthur's" to="MacArthur's" />
<Word from="IVlcBride" to="McBride" />
<Word from="IVlore" to="More" />
<Word from="IVlotherfucker_" to="Motherfucker." />
<Word from="IVlr" to="Mr" />
<Word from="IVlr." to="Mr." />
<Word from="IVlr_" to="Mr." />
<Word from="IVlust" to="Must" />
<Word from="IVly" to="My" />
<Word from="IV|oney" to="Money" />
<Word from="IV|oney's" to="Money's" />
<Word from="L\/Ianuela" to="Manuela" />
<Word from="L\/Ianuelal" to="Manuela!" />
<Word from="l\/Iauzard" to="Mauzard" />
<Word from="l\/Iom" to="Mom" />
<Word from="l\/Iommy" to="Mommy" />
<Word from="l\/Ir" to="Mr" />
<Word from="l\/Ir." to="Mr." />
<Word from="l\/Is" to="Ms" />
<Word from="L\/Iélanie" to="Mélanie" />
<Word from="l\/Iélanie" to="Mélanie" />
<Word from="l\/ly" to="My" />
<Word from="li/lr." to="Mr." />
<Word from="ll/Iommy's" to="Mommy's" />
<Word from="Ll/lajor" to="Major" />
<Word from="ll/lajor" to="Major" />
<Word from="ll/layans" to="Mayans" />
<Word from="lVIan" to="Man" />
<Word from="lVIcHenry" to="McHenry" />
<Word from="lVIr." to="Mr." />
<Word from="lVlacArthur" to="MacArthur" />
<Word from="lVlore" to="More" />
<Word from="lVlr" to="Mr" />
<Word from="lVlr." to="Mr." />
<Word from="lvluslc" to="MUSIC" />
<Word from="lVlust" to="Must" />
<Word from="Nlagnificence" to="Magnificence" />
<Word from="Nlakes" to="Makes" />
<Word from="Nlalina" to="Malina" />
<Word from="Nlan" to="Man" />
<Word from="Nlarch" to="March" />
<Word from="Nlarine" to="Marine" />
<Word from="Nlarion" to="Marion" />
<Word from="Nlarry" to="Marry" />
<Word from="Nlars" to="Mars" />
<Word from="Nlarty" to="Marty" />
<Word from="Nle" to="Me" />
<Word from="Nleet" to="Meet" />
<Word from="Nlen" to="Men" />
<Word from="Nlom" to="Mom" />
<Word from="Nlore" to="More" />
<Word from="Nlornin" to="Mornin" />
<Word from="Nlother" to="Mother" />
<Word from="Nlr" to="Mr" />
<Word from="Nlr." to="Mr." />
<Word from="Nlrs" to="Mrs" />
<Word from="Nluch" to="Much" />
<WordPart from="I\/I" to="M" />
<WordPart from="I\/l" to="M" />
<WordPart from="IVl" to="M" />
<WordPart from="l\/I" to="M" />
<WordPart from="l\/l" to="M" />
<WordPart from="lVI" to="M" />
<WordPart from="lVl" to="M" />
<WordPart from="IVIa" to="Ma" />
<WordPart from="IVIe" to="Me" />
<WordPart from="IVIi" to="Mi" />
<WordPart from="IVIo" to="Mo" />
<WordPart from="IVIu" to="Mu" />
<WordPart from="IVIy" to="My" />
<LinePart from="Wal-I\/Iart" to="Wal-Mart" />

END

<Word from="tvventy" to="tvventy" />
<Word from="Tvventy" to="Tvventy" /> vv=w
<Word from="Voilé" to="Voilà" /> why? remove and replace with <Word from="voilá" to="voilà" />
<Word from="¤Id" to="old" /> remove since <WordPart from="¤" to="o" /> does the job
<Word from="¤Ids" to="olds" />
<Word from="¤n" to="on" />
<Word from="¤ne" to="one" />
<Word from="¤nly" to="only" />
<Word from="¤pen" to="open" />
<Word from="¤r" to="or" />
<Word from="¤rder" to="order" />
<Word from="¤ther" to="other" />
<Word from="¤ur" to="our" />
<Word from="¤ut" to="out" />
<Word from="¤ver" to="over" />
<Word from="¤wn" to="own" />
<Word from="Y¤u'II" to="You'll" />
<Word from="Y¤u'll" to="You'll" />
<Word from="Y¤u're" to="You're" />
<Word from="y¤u're" to="you're" />
<Word from="y¤u've" to="you've" />
<Word from="D¤esn't" to="Doesn't" />
<Word from="d¤n'i" to="don't" />
<Word from="d¤n't" to="don't" />
<Word from="w¤n't" to="won't" />
niksedk commented 5 years ago

thx :) Updated most...

Ding-adong commented 5 years ago
<Word from="g0" to="go" /> lower case. You already have upper case in beginning from section. Capital Go in the middle of a sentence makes no sense.