Open luzpaz opened 5 years ago
We've done some of this due to the multi-dictionary stuff, although not significantly for the main one.
Sounds like this also touches on #1361
The main dictionary.txt is already too big for https://github.com/codespell-project/codespell/blame/master/codespell_lib/data/dictionary.txt to work properly ;-) (although splitting it up wouldn't actually help with the 'blame', because the history would then only go as far back as the file-splitting.)
I'm in favor of sorting/splitting the dictionaries too.
An additional problem is the enUS vs. enGB #1468. At the moment, one has to run codespell twice to get a somehow proper fixed file. First run it for misspelled words (that potentially are fixed to enGB) and run it again to convert the fixes to enUS.
Maybe it would be good to split codespell into repositories for code and dictionaries.
Maybe it would be good to split codespell into repositories for code and dictionaries.
Yeah, I guess the commit-history for the code would be much easier to read if it wasn't also peppered with dictionary updates :slightly_smiling_face:
Noticed how trying to view gelma's dictionary #1244 GH won't let me view the file via the UI and and requests I use git to view it locally. Also at around 2.5MB, a dictionary.txt file starts to slow down Atom and Gitkraken (GUI git cliet (view or diffing).
We aren't at that stage yet but we should prepare our thinking for it.
My proposal is to use the repology-rules model where each letter is it's own separate file.
This obviously makes contributing a PITA, so thinking further (just spit-balling here) if we can program some sort of: