Kozea / Pyphen

Hy-phen-ation made easy
https://courtbouillon.org/pyphen
Other
198 stars 24 forks source link

(German) hyphenation derailed by punctuation characters #37

Closed allefeld closed 1 year ago

allefeld commented 2 years ago

I found this strange behavior:

> dic = pyphen.Pyphen(lang='de')

> dic.inserted('begreifbar')
'be-greif-bar'

> dic.inserted('begreifbar.')
'be-greif-ba-r.'

> dic.inserted('begreifbar«.')
'be-greif-ba-r«.'

The first hyphenation is correct. The second and third have trailing punctuation characters (« is a common closing-quote in German printing), which leads to an additional incorrect hyphenation point being inserted.

I tried to use the local hunspell dictionary instead (/usr/share/hyphen/hyph_de_DE.dic), with the same result.

In this case, I could fix it by removing punctuation characters myself, but I'd still consider it to be a bug, possibly related to #24 and #26.

liZe commented 2 years ago

Hello!

In this case, I could fix it by removing punctuation characters myself

Yes, that’s a "problem" already answered in this comment. Short answer: as some details are specific to each language (and probably to each application), it’s easier to remove the punctuation in your application.

liZe commented 1 year ago

Closing, as we don’t plan to handle punctuation in Pyphen.