when testing with german text, i noticed that despite 69.000 special rules for compound words,
many german words have the hyphen in the wrong place, although there are actually rules for them
e.g. all with wrong hyphens
Fortschritt -> Forts-chritt
Abendstern -> Abends-tern
Morgenthau -> Mor-gent-hau
Gastherme -> Gasther-me
Nennwertherabsetzung -> Nenn-wer-ther-ab-set-zung
in PyHyphen there is a special handling of completely capitalized words (mode 2, 3), there is no handling for words where only the first letter is capitalized
my workaround for 'syllables' ('pairs' has the same problem)
hyphen = Hyphenator("de_DE", directory=DATA_DIR)
def syllables_patch( word ):
mode = 0
if word.istitle():
word = word.lower()
mode = 4
result = hyphen.syllables( word )
if len(result)>0 and mode == 4:
result[0] = result[0].title()
return result
with the patch
Fortschritt -> Fort-schritt
Abendstern -> Abend-stern
Morgenthau -> Mor-gen-thau
Gastherme -> Gas-ther-me
Nennwertherabsetzung -> Nenn-wert-her-ab-set-zung
now all hyphens are correct
the problem affects all rules in all other languages, not just the german combound rules - but the error is clearly visible here
when testing with german text, i noticed that despite 69.000 special rules for compound words, many german words have the hyphen in the wrong place, although there are actually rules for them
e.g. all with wrong hyphens
in PyHyphen there is a special handling of completely capitalized words (mode 2, 3), there is no handling for words where only the first letter is capitalized
my workaround for 'syllables' ('pairs' has the same problem)
with the patch
now all hyphens are correct
the problem affects all rules in all other languages, not just the german combound rules - but the error is clearly visible here
Jürgen