GoogleChrome / web.dev

The frontend, backend, and content source code for web.dev
https://web.dev
Other
3.57k stars 1.58k forks source link

Add linting/testing for spelling and language #3604

Closed petele closed 4 years ago

petele commented 4 years ago

Is your feature request related to a problem? Please describe. I'd love to see a spell check and test for non-inclusive words included in the linting process.

There are a few tools available that could help, for example Alex, or textlint. textlint seems more flexible, and includes rules for Alex, as well as spelling, common misspellings, etc.

We had something like this on WebFu, but it was homegrown and would occasionally run into issues.

Using a tool like this should only generate warnings, since in some cases, there will be false positives, but it's easier to catch them early then miss them.

PS: There's even an Alex GitHub Action

petele commented 4 years ago

cc: @kaycebasques @robdodson

robdodson commented 4 years ago

yeah I started to look into this the other day using eleventy's inclusive language plugin but ran into an issue.

Might not be a big deal but we tend to use "master" a lot in github project file paths and it was logging those.

Alex looks really neat, let me give that a look.

robdodson commented 4 years ago

Taking a look at this which seems to combine all the things https://github.com/place-labs/orthograph-err

robdodson commented 4 years ago

Just keeping some notes:

The Alex GitHub action doesn't support the .alexrc file (issue). By default Alex is pretty aggressive and we'd want to configure it. The action does at least support the profanitySureness setting which is helpful because it has an index of over 1700 words that it considers profane that we use frequently (color, colors, remain(? not sure why but it said it was profane in some languages)).

A better option might be to use textlint and the alex rule. This would also let us use things like common-misspellings and terminology. Unfortunately the textlint rule for alex doesn't support the profanitySureness option, so it ends up flagging color, colors, etc. which we use all over the place.

I sent in a PR to fix textlint-rule-alex. If that lands then I think we could use this GitHub Action to run textlint. The one concern I have is if that action tries to run textlint against the entire project (which takes forever) or if it just runs it against the files changed in the PR. It does support a path pattern so one option might be to combine that action with https://github.com/futuratrepadeira/changed-files which supports a pattern option to filter to only the changed/added markdown files. We could then use the paths property of the dms-textlint-action and pass in just those changed files.

petele commented 4 years ago

Might not be a big deal but we tend to use "master" a lot in github project file paths and it was logging those.

There will always be a number of false positives, but I think the benefits outweigh the negatives. Especially if we use it for spelling and grammar.

If it catches master as the git repo, good, it's a reminder that we may need to update that repo. I found a number of false positives playing with it yesterday, it flagged 'white' when talking about a white background. It also has some contradictions, it suggests trying to avoid simple, but then offers simple as a suggestion for another term.

jpmedley commented 4 years ago

(color, colors, remain(? not sure why but it said it was profane in some languages)

Taboo words are not universally about reproduction or bodily functions like they are in English.