Kozea / Pyphen

Hy-phen-ation made easy
https://courtbouillon.org/pyphen
Other
198 stars 24 forks source link

SINGING hyphenates incorrectly #51

Closed paulvonhippel closed 1 year ago

paulvonhippel commented 1 year ago

Pyphen(lang='en').inserted("SINGING") returns [SINGING] instead of [SING] [ING]

liZe commented 1 year ago

Hi!

The same problem happens here too: https://www.ushuaia.pl/hyphen/?ln=en. It means that it’s probably a bug in the dictionary, not in Pyphen.

You could try to report your bug here instead: https://github.com/LibreOffice/dictionaries/issues

paulvonhippel commented 1 year ago

WHIPLASH is another one that doesn't split. If the problem is in the dictionary, And those are just two that I discovered tonight. There must be hundreds if not thousands of bugs in the dictionary. Reporting them all would be a poor use of time.

Is there another package that hyphenates words (or breaks them into syllables) using rules instead of a word dictionary?

liZe commented 1 year ago

Is there another package that hyphenates words (or breaks them into syllables) using rules instead of a word dictionary?

Hunspell dictionaries (used by Pyphen and many other tools) are not real dictionaries (with one entry per word), they’re already based on "rules". The problem is often that theoretical rules are really complex: American English and British English don’t even use the same rules. The incredible number of exceptions makes the exercise of writing these "dictionaries" endless…

There must be hundreds if not thousands of bugs in the dictionary. Reporting them all would be a poor use of time.

Unfortunately, that’s the only solution I have to offer you.

liZe commented 1 year ago

(Closing as there’s nothing we can do here, but don’t hesitate to continue the discussion if needed!)

paulvonhippel commented 1 year ago

It seems to be practically impossible to submit a bug to Libre Office. They don't accept bug reports in GitHub, they have their own system in bugzilla, and it's not easy to use. I gave up: https://github.com/LibreOffice/dictionaries/issues/44#issuecomment-1571805240

It seems a shame. hyphenate and other tools that rely on Libre Office can't hyphenate common words like WHIPLASH, SINGING, and FIZZES, and there seems to be nothing that can be done about it. There's an opportunity here for someone to develop a better solution with fewer dependencies....

On Wed, May 31, 2023 at 9:37 AM Guillaume Ayoub @.***> wrote:

Is there another package that hyphenates words (or breaks them into syllables) using rules instead of a word dictionary?

Hunspell dictionaries (used by Pyphen and many other tools) are not real dictionaries (with one entry per word), they’re already based on "rules". The problem is often that theoretical rules are really complex: American English and British English don’t even use the same rules. The incredible number of exceptions makes the exercise of writing these "dictionaries" endless…

There must be hundreds if not thousands of bugs in the dictionary. Reporting them all would be a poor use of time.

Unfortunately, that’s the only solution I have to offer you.

— Reply to this email directly, view it on GitHub https://github.com/Kozea/Pyphen/issues/51#issuecomment-1570363877, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIMFN4GEMPHLBKWRTVLDSM3XI5JTPANCNFSM6AAAAAAYU3L7FQ . You are receiving this because you authored the thread.Message ID: @.***>