NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
16.69k stars 13.13k forks source link

Spellchecking - Hyphenation - Thesaurus #14430

Open yokto opened 8 years ago

yokto commented 8 years ago

Spelling - Hyphenation - Thesaurus

Make spellchecking work properly in all languages. I think the focus should be on hunspell. FF, Libreoffice, KDE (sonnet) all use hunspell.

Relevant Components

spellchecking

hunspell: Libreoffice has a nice collection https://github.com/LibreOffice/dictionaries. This also contains Thesaurus and Hyphenation

...

Paths

Aspell: probably works with the variable $ASPELL_CONF

Hunspell: is more problematic because the callee decides on where it searches for the files.

Hunspell files go in (as is the case currently): /run/current-system/sw/share/myspell/dict && /run/current-system/sw/share/hunspell/ ~/.nix-profile/share/myspell/dict && ~/.nix-profile/share/hunspell

enchant and firefox (maybe throught enchant) are smart enough to find them there because of $XDG_DATA_DIRS

sonnet (KDEs spellchecker) and LibreOffice are not smart enough.

...

Packages

Spellchecking

hunspell-dict-en-us (exist in NixOS) aspell-dict-en (exist in NixOS)

Thesaurus

mythes-en (name in Debian)

Hyphenation

hyphen-en (name in Debian)

Meta packages

Should there be meta packages for languages? Or for all languages?

Tasks

Jookia commented 7 years ago

Might be worth adding 'share/hunspell' and 'share/myspell' as default to 'environment.pathsToLink'

I did some digging today and found that Firefox doesn't look in XDG directories from what I can tell (I haven't tested), it just ships a default en_US directory. I plan on patching Firefox and LibreOffice soon to look in XDG_CONFIG_DIRS, but I have some notes that might help now.

LibreOffice loads dictionaries like this: https://cgit.freedesktop.org/libreoffice/core/tree/lingucomponent/source/lingutil/lingutil.cxx#n59 Should be simple enough to loop through XDG_DATA_DIRS and add it.

There's also https://cgit.freedesktop.org/libreoffice/core/tree/configure.ac#n4869 which shows how to set the default systemwide directory for dictionaries wihich means a workaround for now may be this adding this to compile flags:

--with-external-dict-dir=/run/current-system/sw/share/hunspell --with-external-hyph-dir=/run/current-system/sw/share/hyphen --with-external-thes-dir=/run/current-system/sw/share/mythes

(I didn't try this.)

As for firefox, there's the directory finding code here: https://github.com/mozilla/positron/blob/master/extensions/spellcheck/hunspell/glue/mozHunspell.cpp#L307

It shows that it looks for "dictionaries" in the application directory, and it ships with en_US there. It should be easy to loop through XDG_DATA_DIRS and add some more search code.

Jookia commented 7 years ago

I did a survey last night to see how this could be handled across multiple implementations and found the following facts:

Dictionaries in GNU Aspell aren't compatible with Hunspell, but given dictionary finding is done in Aspell patching it shouldn't be an issue to get it to look for dictionaries in a certain place, or force it to always read ASPELL_CONF.

Speciically, DICPATH is a list of dictionaries separated by ':', and are searched before other paths for dictionaries.

For example, you'd have /usr/share/hunspell/en_US, /usr/share/hyphen/hyph_en_US and /usr/share/mythes/thes_en_US_v2. There's no reason they can't all exist in one directory, and LibreOffice dictionaries do just that.

This is for the executable Ispell, so this can just be patched. DICTDIR is a single directory, so it's not the same as DICPATH.

From reading the source code though it seems to treat it as a directory, not a PATH.

I think the best approach here would be to get projects upstream to handle DICPATH given how trivial it is. I opened an issue in the Hunspell project to see if this is the right idea. So far I have LibreOffice patched to do this and it works good.

Jookia commented 7 years ago

See https://github.com/hunspell/hunspell/issues/413 and https://gerrit.libreoffice.org/#/c/29543 for current upstream issues

Jookia commented 7 years ago

DICPATH pushed upstream as https://gerrit.libreoffice.org/gitweb?p=core.git;a=commitdiff;h=8e8afc358b7537d493b478b429e1711c6ab46bdc , should backport it to current packages given the code hasn't changed.

Jookia commented 7 years ago

So Firefox DICPATH patch is at https://bugzilla.mozilla.org/show_bug.cgi?id=1310835 , I'll start poking people tomorrow if it's noticed. It's not an immediate concern since you can add and set the key 'spellchecker.dictionary_path' in Firefox to /run/current-system/sw/share/hunspell/ (I think that's it) if needed and have it search for your dictionaries.

Jookia commented 7 years ago

The patch has passed review and is now in mozilla-inbound.

FRidh commented 7 years ago
mmilata commented 4 years ago

80353 attempts to make LibreOffice work with hunspell dictionaries installed through Nix by adding the dictionary directories to DICPATH in shell wrapper, feel free to test:)

stale[bot] commented 3 years ago

Hello, I'm a bot and I thank you in the name of the community for opening this issue.

To help our human contributors focus on the most-relevant reports, I check up on old issues to see if they're still relevant. This issue has had no activity for 180 days, and so I marked it as stale, but you can rest assured it will never be closed by a non-human.

The community would appreciate your effort in checking if the issue is still valid. If it isn't, please close it.

If the issue persists, and you'd like to remove the stale label, you simply need to leave a comment. Your comment can be as simple as "still important to me". If you'd like it to get more attention, you can ask for help by searching for maintainers and people that previously touched related code and @ mention them in a comment. You can use Git blame or GitHub's web interface on the relevant files to find them.

Lastly, you can always ask for help at our Discourse Forum or at #nixos' IRC channel.

GlassGhost commented 1 year ago
  environment.systemPackages = with pkgs; [
    # basic computer software
    libreoffice
    hyphen
    hunspell
    hunspellDicts.en_US
    hunspellDicts.en-us

LibreOffice still gives "Missing Hyphenation Data. Please install the hyphenation package for locale “en-US”." even after correct installation of the above packages. Is this a pathing or naming issue How can I help?

theCapypara commented 2 weeks ago

I added hyphen support for LibreOffice and the ability to add new hyphen dicts besides english in https://github.com/NixOS/nixpkgs/pull/325290