Spell checking and style checking of the English LP content (vale)

spier commented 1 year ago

Adds a GitHub Action that runs a spell check and style check for the Learning Path content, the same way we do for our Patterns.

As a matter of fact, one of the key benefits of this approach is that the rules as well as the vocabularies that the GitHub Action checks against are shared in a central repository isc-styles. By keeping a set of common rules in there, we will be able to create a more consistent style of writing across all of the written resources of the ISC.

Examples of what this GitHub Action checks:

spelling
- differences between British and American English - e.g. organization, analyzing, etc.
- wrong spelling of words e.g. Progess
style - InnerSource rather than inner source

In this PR I started with fixing issues discovered in the project-lead folder. However over time I also fixed trivial spelling mistakes.

Example output of these checks only for the project-lead folder: https://github.com/InnerSourceCommons/InnerSourceLearningPath/actions/runs/5066827354/jobs/9097134561

Key changes

[x] Adding a GHA to run on any changes to .asciidoc files
[x] Fixing straight forward issues and differences in British => American English
[x] adding a custom vocabulary that could be used to add custom words, so that vale passes on the whole LP content
[x] adding documentation for maintainers about how to install and use vale
[x] only check the English language files (i.e. not the translations and no files that are not published) - not super happy with the current solution but it works

FYI This replaces the first experiment for a style/spell checker in #523.

spier commented 1 year ago

To process *.asciidoc, vale also needs this to be installed: https://rubygems.org/gems/asciidoctor

Should document that somewhere.

Also need to check if the GHA comes pre-packaged with this dependency.

spier commented 1 year ago

Hmm. I cannot get the vale GHA to report anything on this PR, even though vale reports a bunch of spelling issues when I run it locally. Investigating ...

rrrutledge commented 1 year ago

Thanks for checking, @spier

rrrutledge commented 1 year ago

Thanks for working on this, @spier ❗️

spier commented 1 year ago

It proves to be a non-trivial task to select exactly the files that we want to lint with vale, and ignore the rest. At least I have done a tone of tries now and still not figured it out yet.

The best version I was able to find so far is:

search all of these folders introduction/ trusted-committer/ contributor/ product-owner/ project-leader/
ignore any files in subfolders (which are the translations)

To do this I am using a negated glob '--glob=!*/*/*' that vale accepts as an option.

I would prefer to use a positive glob that describes the files that we want to lint, rather than stating which files to ignore. Could not figure out how to do that though.

rrrutledge commented 1 year ago

I’m not sure how to specify with regex. It looks like you are doing a lot of great work here, though.

spier commented 1 year ago

@rrrutledge where does the LP keep documentation for maintainers? I am trying to figure where to to best explain how this spell/style checker works, so that the respective TCs/maintainers can find the info when they need it.

rrrutledge commented 1 year ago

where does the LP keep documentation for maintainers

Thank you for documenting! Please put the maintenance instructions here: https://github.com/InnerSourceCommons/InnerSourceLearningPath/tree/main/docs

rrrutledge commented 1 year ago

Hi, @spier, thanks for this work! Let us know if you have any maintenance instruction? We are very excited for this spell checking. Thank you!

rrrutledge commented 1 year ago

@marshmallowrobot will take a look at what's here and and how we can get it over the finish line 🏁 ❗

spier commented 1 year ago

It has been a while since I looked at this myself, so I have to refresh my memory in which state I left things. While doing that I will write documentation, which is one of the outstanding tasks for this PR anyways.

spier commented 1 year ago

I added basic documentation now. See: https://github.com/InnerSourceCommons/InnerSourceLearningPath/pull/560/files#diff-143b0ea5d1b769b6ccd6610add5c24fd683705b9b66184906f874bf1c8b5c32b

@marshmallowrobot if you have time to review this, that would be great. There are still some things to fix and to document better here. Do you have prior experience with vale? Would be great to get your help :)

spier commented 1 year ago

Example of how an inline highlighting of a possible spelling mistake found by vale looks like:

Screenshot 2023-08-02 at 22 41 08

spier commented 1 year ago

@rrrutledge and others, I think this PR is now in a state that I can be fully reviewed.

I did add more documentation for maintainers, and also found a way to limit the checks mostly to files that contain content that Learning Path users/readers see.

Note that as part of this PR I already fixed some of the trivial spelling mistakes (like organisation => organization) to reduce the amount of errors/warnings that the spellchecker produces.

Any questions, let me know.

marshmallowrobot commented 1 year ago

Hey @spier! Couple of quickie questions for ya...

1) At the very bottom of the documentation, there's a line that starts with "TBD"... is this intentional?

2) Let's pretend I am merging a PR with a REALLY embarrassing spelling error in it. Upon merge/push to main, both workflows will kick off in parallel (publish-to-website and vale), yes? So then, I expect the vale workflow to fail. But the publish-to-website workflow to complete successfully. Is this still right? The result will be a failing build task BUT the spelling error is deployed to production. Am I still on the right track here? How do ya'll handle this case in the Patterns repo? Do ya'll let the error go and just scramble to correct it afterwards?

spier commented 1 year ago

Adding an example of the reviewdog [vale] report to the documentation.

Screenshot 2023-08-08 at 23 07 13

spier commented 1 year ago

Hey @spier! Couple of quickie questions for ya...

Hi @marshmallowrobot.

At the very bottom of the documentation, there's a line that starts with "TBD"... is this intentional?

Yeah, the TBD as intentional but I can see how it might have been confusing. It is a place in the documentation where I don't have enough experience with the approach yet, and hence cannot write better documentation right now. i.e. we have to improve the documentation once we have learned more about it.

I tried to rewrite the TBD part a bit. Does it help?

Let's pretend I am merging a PR with a REALLY embarrassing spelling error in it. Upon merge/push to main, both workflows will kick off in parallel (publish-to-website and vale), yes? So then, I expect the vale workflow to fail. But the publish-to-website workflow to complete successfully. Is this still right? The result will be a failing build task BUT the spelling error is deployed to production. Am I still on the right track here? How do ya'll handle this case in the Patterns repo? Do ya'll let the error go and just scramble to correct it afterwards?

Not sure how to best answer this but let me try:

First of all, it is important to note that currently the vale check will not fail, even if it finds spelling issues. Instead it will highlight issues inline (in areas that the change in the given PR has touched), or otherwise highlight possible issues in the report it generates (like this one). - FYI I just added more documentation about these two different "reports".

Why is the check is configured to "not fail":

1) bothering a contributors with issues that may be in an entirely different file seems unreasonable 2) spelling/style is not always a 100% exact art i.e. people may disagree on things, or we may use some sort of niche language that makes sense in the InnerSource/Tech domain but may not show up in dictionary

For these reasons I felt that it is better to generate suggestions rather than strict errors/failures (at least for now), and leave it to contributors and maintainers to "take them or leave them" for now.

To the other part of your question:

The idea is that spelling issues are already spotted on the PR/branch itself, before that branch is merged into main. So at least when contributors and maintainers are taking a look at the suggestions that vale generates, they have a chance to fix them before they merge the PR into main.

My assumption was that once we have reached a state where vale is reporting no spelling issues at all in the entire Learning Path content, and we generally agree with the issues that vale tends to flag, then we may decide to become more strict with how we use the GitHub Action, and possibly even configure vale to be able to fail and block the PR from being merged.

Does that help?

marshmallowrobot commented 1 year ago

First of all, it is important to note that currently the vale check will not fail, even if it finds spelling issues.

Ah... so the Filtered Findings ARE the report ok ok ok. And since they don't fail anything, it'll become a matter of discipline and practice got it.

I tried to rewrite the TBD part a bit. Does it help?

Love it.

The idea is that spelling issues are already spotted on the PR/branch itself, before that branch is merged into main. So at least when contributors and maintainers are taking a look at the suggestions that vale generates, they have a chance to fix them before they merge the PR into main.

I totally missed where one of the triggers was a new PR from main. This all makes more sense to me now.

Thanks so much!

spier commented 1 year ago

@marshmallowrobot thank you for the review.

I made the requested changes, and fixed minor spelling mistakes in the workbook files.

Please review again :)

spier commented 1 year ago

Awesome. Will you merge, or who does?

spier commented 1 year ago

@marshmallowrobot @rrrutledge checking in to see who should merge this PR to get it integrated into the mainline, so that the spell/style checker can run on all future PRs.

I am not a TC on this repo, so I don't know if I should merge or not?

rrrutledge commented 1 year ago

@marshmallowrobot van merge when ready.

marshmallowrobot commented 1 year ago

Ah sorry all! Just merged.

spier commented 1 year ago

I checked a couple of pages like this one, and it looks like the spelling fixes in this PR are already live. Nice!

Will keep an eye out to see how vale performs in future PRs. Hope that you and the other contributors will find it helpful 🚀

InnerSourceCommons / InnerSourceLearningPath

Spell checking and style checking of the English LP content (vale) #560

Key changes