WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
254 stars 203 forks source link

Add pre-commit hooks for spell checking #378

Closed dhruvkb closed 1 year ago

dhruvkb commented 1 year ago

Problem

We don't have Git hooks to check for spelling mistakes in code or in commit messages.

Description

pre-commit has several hooks that could provide this check. Visit https://pre-commit.com/hooks.html and filter by the words 'spell' or 'typo' to see a list of options. Finding a popular, maintained one in the list and using it would improve the spelling situation considerably.

Alternatives

Relying on spell check by our IDEs and text editors is okay but it doesn't always work. Just yesterday, I made a commit called "Update pacakges" even though my IDE showed a wavy underline under the word.

References

Here are some of the most starred options from pre-commit's list:

MalanB commented 1 year ago

Hey folks, can I pick this issue and work on it?

dhruvkb commented 1 year ago

@MalanB sure! Please go ahead.

MalanB commented 1 year ago

So far, tried codespell and cspell-cli on local. Codespell missed few typos made on dictionary and on non-dictionary words. cspell-cli did not work even after trying out their setup guide. Will figure if any other spell checker fits the need.

dhruvkb commented 1 year ago

I can see how spell checking code is a whole other problem compared to prose. If you could compile a report of why each of the rejected options was rejected, that'll be a good enough resolution. We'll close this issue if none of the spell checkers are a good fit.

sarayourfriend commented 1 year ago

I'd be curious about trying to use LanguageTool for this. It has an excellent dictionary, can easily be configured with exceptions, gives very good feedback, and can be configured to check various prose style and phrase complexity rules.

As far as I can tell there isn't an out-the-box solution for this, but I'd be keen to give making this work a shot with a custom pre-commit hook that uses the LanguageTool docker image. It's certainly a heavier solution than cspell or hunspell, but it's very good for natural languages. The plain english rules, for example, would probably benefit us greatly.

We would certainly need something else altogether for code, though. And whatever custom solution for using LanguageTool we used would need to have an ignore-pragma built in.

sarayourfriend commented 1 year ago

I've looked into trying to use LanguageTool for this and it would be way too complicated. We should just use codespell for code and cspell for Markdown/documentation.

sarayourfriend commented 1 year ago

@MalanB Are you still interested in working on this issue? I've unassigned you for now, but if you'd like to take it on again, just ping here and let us know :pray:

MalanB commented 1 year ago

Sorry I was occupied with work so could make progress on this. I would let someone to take this issue further.