codespell-project / codespell

check code for common misspellings
GNU General Public License v2.0
1.81k stars 465 forks source link

"Imbed" is a valid alternate form of "embed" #3477

Closed oddhack closed 2 weeks ago

oddhack commented 2 weeks ago

So remove suggested rewrites from the dictionary.

N.b. it is not clear from the README if the philosophy of this project is to drive people to "preferred" spellings or to "correct" spellings. If it is the latter, "imbed" and variants are valid (albeit less common) alternates to "embed" - see for example https://www.merriam-webster.com/dictionary/imbed

(I had previously created this as #2415 and later deleted my repo fork, without realizing it would also close the PR, so retrying).

DimitriPapadopoulos commented 2 weeks ago

While the OED contains an imbed entry, it is missing from SCOWL speller dictionaries, with this mention:

[v] The word is considered a spelling variant. To promote consistent spelling, only one spelling of a word is generally included in a the smaller dictionary. The larger dictionary lets in common variants (level 1).

I tend to see SCOWL (And Friends) as the reference open source speller dictionary, perhaps an opinionated one. I would therefore suggest you request it is added under https://github.com/en-wl/wordlist/issues — I requested additions myself. However:

In the meantime, I am not sure how to handle this entry:

The Google Ngram Viewer shows that embed is 60 times more common than imbed in contemporary English — perhaps not enough to disallow imbed. I really cannot define a proper threshold here, but I would suggest these entries are moved to rare.

oddhack commented 2 weeks ago

Fair, and I already have it in an exception list for our docs. I will close this as it was a tentative suggestion to begin with, not something I have a strong opinion about.

oddhack commented 2 weeks ago

N.b. SCOWL appears to be basically dead with no updates in 4 years. Fortunately languages evolve slowly :-)

kevina commented 1 week ago

Hi,

I am the maintainer of SCOWL and noticed you mentioned https://github.com/en-wl/wordlist/issues/394 in this issue.

SCOWL is not dead. A major update, with a new format is pending, but I do not have a ETA yet. Once this is out I will work on going through the backlog of suggestions for new words.

This new format will provide SCOWL is a SQLite3 database format. The new format will include variant information that might be helpful to you. You can preview the new format at https://github.com/en-wl/wordlist/tree/v2.

Kevin