Closed paulvonhippel closed 1 year ago
Hi!
The same problem happens here too: https://www.ushuaia.pl/hyphen/?ln=en. It means that it’s probably a bug in the dictionary, not in Pyphen.
You could try to report your bug here instead: https://github.com/LibreOffice/dictionaries/issues
WHIPLASH is another one that doesn't split. If the problem is in the dictionary, And those are just two that I discovered tonight. There must be hundreds if not thousands of bugs in the dictionary. Reporting them all would be a poor use of time.
Is there another package that hyphenates words (or breaks them into syllables) using rules instead of a word dictionary?
Is there another package that hyphenates words (or breaks them into syllables) using rules instead of a word dictionary?
Hunspell dictionaries (used by Pyphen and many other tools) are not real dictionaries (with one entry per word), they’re already based on "rules". The problem is often that theoretical rules are really complex: American English and British English don’t even use the same rules. The incredible number of exceptions makes the exercise of writing these "dictionaries" endless…
There must be hundreds if not thousands of bugs in the dictionary. Reporting them all would be a poor use of time.
Unfortunately, that’s the only solution I have to offer you.
(Closing as there’s nothing we can do here, but don’t hesitate to continue the discussion if needed!)
It seems to be practically impossible to submit a bug to Libre Office. They don't accept bug reports in GitHub, they have their own system in bugzilla, and it's not easy to use. I gave up: https://github.com/LibreOffice/dictionaries/issues/44#issuecomment-1571805240
It seems a shame. hyphenate and other tools that rely on Libre Office can't hyphenate common words like WHIPLASH, SINGING, and FIZZES, and there seems to be nothing that can be done about it. There's an opportunity here for someone to develop a better solution with fewer dependencies....
On Wed, May 31, 2023 at 9:37 AM Guillaume Ayoub @.***> wrote:
Is there another package that hyphenates words (or breaks them into syllables) using rules instead of a word dictionary?
Hunspell dictionaries (used by Pyphen and many other tools) are not real dictionaries (with one entry per word), they’re already based on "rules". The problem is often that theoretical rules are really complex: American English and British English don’t even use the same rules. The incredible number of exceptions makes the exercise of writing these "dictionaries" endless…
There must be hundreds if not thousands of bugs in the dictionary. Reporting them all would be a poor use of time.
Unfortunately, that’s the only solution I have to offer you.
— Reply to this email directly, view it on GitHub https://github.com/Kozea/Pyphen/issues/51#issuecomment-1570363877, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIMFN4GEMPHLBKWRTVLDSM3XI5JTPANCNFSM6AAAAAAYU3L7FQ . You are receiving this because you authored the thread.Message ID: @.***>
Pyphen(lang='en').inserted("SINGING") returns [SINGING] instead of [SING] [ING]