dr-leo / PyHyphen

Other
10 stars 5 forks source link

hyphenation rules are not found for words beginning with a capital letter #23

Open jschoen42 opened 1 week ago

jschoen42 commented 1 week ago

when testing with german text, i noticed that despite 69.000 special rules for compound words, many german words have the hyphen in the wrong place, although there are actually rules for them

e.g. all with wrong hyphens

in PyHyphen there is a special handling of completely capitalized words (mode 2, 3), there is no handling for words where only the first letter is capitalized

my workaround for 'syllables' ('pairs' has the same problem)

hyphen = Hyphenator("de_DE", directory=DATA_DIR)

def syllables_patch( word ):
    mode = 0
    if word.istitle():
        word = word.lower()
        mode = 4

    result = hyphen.syllables( word )
    if len(result)>0 and mode == 4:
        result[0] = result[0].title()

    return result

with the patch

now all hyphens are correct

the problem affects all rules in all other languages, not just the german combound rules - but the error is clearly visible here

Jürgen

jschoen42 commented 6 days ago

I have tested the patch with the german word lists in repo https://github.com/cpos/AlleDeutschenWoerter

result with the patch