InnerSourceCommons / InnerSourceLearningPath

Holds the source material for the InnerSource Commons Learning Path
https://innersourcecommons.org/learningpath
Creative Commons Attribution Share Alike 4.0 International
74 stars 46 forks source link

Spellchecker for English LP content #523

Closed spier closed 1 year ago

spier commented 1 year ago

I added a basic setup for a GitHub Action that runs a spellchecker for the LP content. This is modelled after a very similar approach that we are just introducing for our Patterns (see https://github.com/InnerSourceCommons/InnerSourcePatterns/pull/519).

A key difference here is that the tool used here, pyspelling, does seem to work a bit better with markdown, than it works with asciidocs. However I tweaked the configuration based on an example that I found, and it looks like it works well enough.

To judge the effect of the spellchecker, please compare the results of the 1st run, with the 2nd run. Before the 2nd run I had already fixed a couple of obvious spelling errors.

Left to do:

I am happy to answer your questions. Just note, that I am only using pyspelling/aspell for a couple of days myself now :)

rrrutledge commented 1 year ago

This is great, @spier ❗️ Thanks for this work. Very excited to take a look.

rrrutledge commented 1 year ago

This is great! Where does the output go after the job is run?

lenucksi commented 1 year ago

This is great! Where does the output go after the job is run?

If this were to raise a PR with the fixes it made it'd be great.

spier commented 1 year ago

@lenucksi the spell checker does not make any fixes itself. Not sure if it can be configured to do that. I doubt it actually, as it would be hard to know what the intended word is.

What it does is that it shows the spelling issues in the output of the GitHub Action. (@rrrutledge not sure if this answers your question about where the output goes?)

It is then up to the author to review those issues and decide how to fix them. The options are:

  1. fix the words that the spell checker identified as problematic
  2. add the given word to the list of exceptions

FYI for our patterns repo I am experimenting with another spell checker right now (Vale). That one would show the spelling issues inline in the files. I can also do other checks for prose to get to a more consistent editorial style (inclusive language etc).

I am not sure yet which spell checker we will go with for our patterns repo.

So if you like, we can move this PR here back to Draft for the moment. Up to you.

lenucksi commented 1 year ago

@lenucksi the spell checker does not make any fixes itself. Not sure if it can be configured to do that. I doubt it actually, as it would be hard to know what the intended word is.

Most mistakes are very likely small errors like swapped out letters. All of that yields a very small edit distance. I'd expect the spellchecker to have a configurable safe threshold for adding their most likely correction. And that's exactly why a PR is the way to go here. It proposes changes without any modification that you can then fix or entirely discard ideally saving you some time.

What it does is that it shows the spelling issues in the output of the GitHub Action. (@rrrutledge not sure if this answers your question about where the output goes?)

Having to manually extract edits and a diff here really sounds like more new work created than time saved, so I'd not use such a feature.

As for the style consistency there's still #222 which would address this. There's tooling to address this in a professional and effective way, it's not free though. Knowing those tools they are usually more interactive and less CI friendly, so that's probably not really first order of business - as was the conclusion of #222 as well.

rrrutledge commented 1 year ago

This is very 🆒 , @spier . I will bring this up in our next working group meeting.

I am not sure yet which spell checker we will go with for our patterns repo.

I think that there's value in being on the same one, so let's see what you come up with.

show the spelling issues inline in the files

As comments or something? That would be great.

spelling issues in the output of the GitHub Action

I see it now. Great.

rrrutledge commented 1 year ago

@spier is there an issue or a pull request that we can follow in the InnerSourcePatterns repo for us to stay in touch with what spellchecker you are choosing? We are happy to follow your lead, here. Thanks for including us :).

spier commented 1 year ago

@rrrutledge good idea. here is the issue: https://github.com/InnerSourceCommons/InnerSourcePatterns/pull/519

rrrutledge commented 1 year ago

Thank you, @spier ❗️ I added myself as a watcher.

rrrutledge commented 1 year ago

We should update our CONTRIBUTING.md document to reference how to find and fix spelling errors in pull request.

marshmallowrobot commented 1 year ago

Hey @spier! It seems like progress on this PR has slowed. But, we have a new group of articles that will be published soon and I think having a spell checker in place would be a great help with them. What, if anything, can I do to help get these changes pushed through?

I can see from the conversation in https://github.com/InnerSourceCommons/InnerSourcePatterns/pull/519 that you'd prefer to change the spell checker to Vale. Could you maybe list out the relevant config files (versus ones you may have been experimenting with) from InnerSourcePatterns repo that we should use over here? I could help get the code changes copied over.

spier commented 1 year ago

@marshmallowrobot thanks for reaching out. You are right, there is no further progress on this from my end.

And yes, my last status was that Vale looks like the better option for us, given that the integrations into GitHub seemed nicer. See also the summary of the overall status of the experiments I did with vale.

Thank you for your offer to help!

Unfortunately I forgot where I was at exactly, so I cannot even ask for specific help right now. I have to look at this in more detail again to determine the most minimal set of configuration files that you would need to make vale useful for you. Will let you know what I find, however it may take a bit.

rrrutledge commented 1 year ago

Thanks for this work and this update, @spier ! We are very thankful and excited for spell checking. Let us know if you have a rough idea of when you might work on it. Any schedule is OK, but if you think that it will be a while before working on it some more then we may merge what's here so that at least we have some spell checking in the meantime until your fancy, new Vale work is finished.

spier commented 1 year ago

That is good clarification @rrrutledge.

From previous comments I thought that the current solution would not be that useful for the LP content, which was one of the reasons for trying out vale.

So I will keep the current solution as a fallback option.

As for an estimate, I hope to have time around the Easter days, so within the next 10 days or so should be realistic.

rrrutledge commented 1 year ago

next 10 days or so Oh! That’s not so long - we’ll probably wait. We’d rather be in the same thing as InnerSourcePatterns.

rrrutledge commented 1 year ago

@spier we saw that you created https://github.com/InnerSourceCommons/isc-styles for some more Vale work - so it looks like this may be moving along? That's great!

spier commented 1 year ago

@rrrutledge hehe you got your eyes everywhere :)

Yeah, I am trying to package up the styles/checks in a central place, so that it is easier to create some level of consistency across various ISC material.

I am not 100% there yet but I have a basic idea of how it might work. So yeah, this is moving along, slowly but surely :)

spier commented 1 year ago

Closing this in favor of #560