kapilt commented 4 years ago

Description: Research markdown linting tool for sig markdown files and PRs.

follow up from sig security meeting (dec 19th)

[x] SIG Leadership: @lumjjb
[X] Project Lead: @JohnHillegass
[x] check off what was accomplished with https://github.com/cncf/tag-security/pull/497
[x] Link checking
[x] Linting (80 column chars, header capitalization)
[x] spell check with ability to add custom "words" (e.g. "cloud native", "SIG-Security")
[ ] fix files with more than 80 column chars in main branch
[ ] fix spelling errors in main branch
[x] finish rename of SIG->TAG (only for written content) https://github.com/cncf/tag-security/issues/613
[ ] update CONTRIBUTING section about PRs to link to or include requirements (if not already there)
[x] enable with warning (and send a note to #tag-security-triage channel in slack
[ ] after a while make it block PRs (a week? a certain number of PRs?)

ultrasaurus commented 4 years ago

added some common inconsistencies from Contributor's Guide / Writing Style section

pbnj commented 4 years ago

Is there agreement on markdown style, e.g. SETEXT vs ATX, CommonMark vs GitHub-Flavored Markdown?

Different tools have varying levels of support for the different markdown specs and it would help narrow down the tool selection/evaluation process.

Some markdown linting tools I have come across in the past:

A tool I have been happily using is: prettier

Pros:

It's primarily used for formatting JavaScript, but it has built-in support for Markdown, as well as others, like YAML, JSON ...etc.
The rules/options are configurable, e.g. Print Width, Tab Width, Prose Wrap, End of Line ...etc.
There are plenty of editor integrations across popular editors, e.g. atom, vscode, vim, emacs, and sublime.

Cons:

It's a formatter, not a linter. So, running the tool will auto-correct issues instead of give you pointers to them. There is no way to have make it behave like a linter (AFAIK).
It's not primarily a Markdown tool, thus it doesn't have a lot of rules specific to Markdown (e.g. header capitalization, spell-checking).

kapilt commented 4 years ago

@pbnj thanks, i suspect we're targeting github flavored markdown, given the predominant usage of that style within the repo.

There's a dizzying plethora of tools in this space, the two most common markdownlint (ruby & nodejs) don't appear to target the spelling or link checking use cases. it maybe we just need separate tools around each major use case (lint, link-check, spelling)

prettier seems interesting, afaics running as a code formatter is a different use case, but setup as a repo pre-commit hook would help automate some of the styling drudgery if its documented as part of the repo workflow (or perhaps a makefile target).

@ultrasaurus in terms of integrating into this repo does a github action as a pr check suffice?

ultrasaurus commented 4 years ago

@kapilt a github action sounds like a great idea. I haven't set those up myself before, and am exceted to learn about them from your PR!

IAXES commented 4 years ago

Is a GitHub action the preferred implementation? If so, would there be any objections to creating a Docker container action to execute the linting/parsing/scoring code? If so, the image could be deployed to Docker hub, source put on GitHub, and it should be trivial to allow individuals to manually invoke it offline/locally while they are editing/adding documents.

pbnj commented 4 years ago

@IAXES - GitHub actions use docker containers to run tasks. So, it should still be portable enough for the use-case you've described.

lumjjb commented 4 years ago

I think it would be good to have a make check which will do all this offline as well. it would be great if we can get a prototype PR for an offline script first, and one of us with write access can create a hook for it.

TheFoxAtWork commented 4 years ago

at @lumjjb request - info on setting line wraps in markdown thru VIM: :set textwidth=80 then gqG to apply at the line down the doc. adjusting as necessary for links, etc.

TheMoxieFox commented 4 years ago

Also set wm=2 will do it as you type.

IAXES commented 4 years ago

I'll look into putting together a minimal PoC this weekend with a sample/dummy repo on my own GitHub account. After that point, maybe I could use the Zoom meeting space to do a live demo/recording.

It won't necessarily have all the final linting rules/checks in place, but it could at least implement the make check step, and future enhancements/one-off-commits could chip away at the broader task of defining individual linter rules, somewhat similar to checkpatch.pl for the Linux kernel.

As for checking links, is it enough to just run curl against them all and make sure they get a valid result, regardless of redirects (i.e. eventually results in a 200 OK)?

While we're at it, I'll add a minimal EditorConfig template, since that should provide a minimal editor guideline for most major/modern text editors.

stale[bot] commented 4 years ago

This issue has been automatically marked as inactive because it has not had recent activity.

TheFoxAtWork commented 4 years ago

@IAXES wanted to check in on this.
@lumjjb @ultrasaurus did we finalize the completion requirements for this? it looks it was just research which may have been completed?

stale[bot] commented 4 years ago

This issue has been automatically marked as inactive because it has not had recent activity.

JohnHillegass commented 3 years ago

Here is an example of what this could look like being automated. There are a lot of variations in errors which we would need to figure out how to cover. Such as MD013/line-length Line length [Expected: 80; Actual: 236] is 80 actually enough characters? Maybe it should be 240? We can override these but there are a lot of rules that may require a decision on whether to apply, extend, or not https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md

A couple of other oddities are:

the link checker will flag markdown files without anchor links. Maybe there is a way to tell it to ignore those?
the spellchecker will always complain about our GitHub action file since words like xargs are not in the dictionary. Maybe we could add words like this to a custom dictionary similar to https://github.com/JohnHillegass/sig-security/blob/test-ci/cspell.json. We would already have to add any abbreviations or contributor names so it allows them.

In this example, I am using

JohnHillegass commented 3 years ago

@TheFoxAtWork @lumjjb @kapilt Good Evening! I was not sure if I should bring this discussion up during the call today. Is that an appropriate forum to discuss open issues/PRs? I have a draft PR for adding this but think it may be easier over a Zoom call. Do we ever do short one-off calls to discuss things like this in more detail?

TheFoxAtWork commented 3 years ago

@JohnHillegass Yes! sorry just getting to this! please add this to the meeting agenda for next week as a topic of discussion. We typically call these working meetings.

JohnHillegass commented 3 years ago

Ok sounds good, will add it to the agenda. Thanks, @TheFoxAtWork!

JohnHillegass commented 3 years ago

FYI @TheFoxAtWork @lumjjb I have moved this to the 12/16 meeting because of a conflict tomorrow.

TheFoxAtWork commented 3 years ago

No worries!

JohnHillegass commented 3 years ago

Moving to 12/23 because of another conflict 🤦 I am on vacation that week so can definitely make it.

stale[bot] commented 3 years ago

This issue has been automatically marked as inactive because it has not had recent activity.

IAXES commented 3 years ago

I have a code snippet for the linting part, if that helps (i.e. uses a tool called mdl, which is installed via npm).

https://www.npmjs.com/package/markdownlint

I can throw together the Docker container if someone else with the ability to modify the github actions to test/run it can very for me. Do we have a location where the Dockerfile for the testing image (and the container itself, i.e. "SIG Security Dockerhub repo") can go?

JohnHillegass commented 3 years ago

@IAXES there was a PR for this from about a month back you could possibly base some of the work off of https://github.com/cncf/sig-security/pull/497

It's just using the GitHub actions node images though, nothing custom from the SIG side.

stale[bot] commented 3 years ago

This issue has been automatically marked as inactive because it has not had recent activity.

JohnHillegass commented 3 years ago

Closed by #497

ultrasaurus commented 3 years ago

@JohnHillegass looks from comments on https://github.com/cncf/tag-security/pull/497 that it was checked in, but disabled.

Isn't there a way to have it warn, but not block?

ultrasaurus commented 3 years ago

Also, since there's continued work to get the repo in a state where these checks will pass. I'd like to convert to a "project" if folks have appetite to keep working on it, hoping @lumjjb will represent the leadership team on this and be responsible for review/merge. @kapilt has been not been active in the comments in over a year, so I'm going to make this assigned to no one for now.

Anyone want to volunteer to take the lead on this?

JohnHillegass commented 3 years ago

@JohnHillegass looks from comments on https://github.com/cncf/tag-security/pull/497 that it was checked in, but disabled.

Isn't there a way to have it warn, but not block?

Hey @ultrasaurus thats correct we could chose to not require the checks to pass on a PR via branch protection rules. The checks themselves are pass fail but when not required to pass it doesn't matter when they fail, if that makes sense.

ultrasaurus commented 3 years ago

Thanks @lumjjb for volunteering to do the admin stuff and @JohnHillegass for taking the lead on this. Left the "help wanted" tag on, since there's lots of fixing up of current repo which could be done by anyone

lumjjb commented 3 years ago

705

704

ultrasaurus commented 3 years ago

Seems like spell-checker is having trouble with CamelCase acronyms and maybe a word inside a link?

see https://github.com/cncf/tag-security/pull/720#issuecomment-871797352

ultrasaurus commented 3 years ago

note: fixed https://github.com/cncf/tag-security/pull/720 with HTML-style comment in the markdown to tell cSpell to ignore those specific cases

thanks @JohnHillegass for pointing out: https://github.com/streetsidesoftware/cspell/tree/main/packages/cspell#in-document-settings

we should add this trick to documentation somewhere

ultrasaurus commented 3 years ago

Trying to fix it so README passes lint, I found spelling lint error that has no message (that I can see):

Creating network "tag-security_default" with the default driver
Creating tag-security_spelling_run ... 
Creating tag-security_spelling_run ... done
/usr/local/bin/cspell -> /usr/local/lib/node_modules/cspell/bin.js
+ cspell@5.6.6
added 132 packages from 60 contributors in 22.054s
From https://github.com/cncf/tag-security
 * [new branch]      main       -> main
 * [new branch]      main       -> origin/main
1/1 ./NEW-MEMBERS.md 564.62ms
CSpell: Files checked: 1, Issues found: 0 in 0 files
CSpell: Files checked: 0, Issues found: 0 in 0 files
1/1 ./README.md 550.37ms
CSpell: Files checked: 1, Issues found: 0 in 0 files
123
make: *** [Makefile:19: spelling] Error 123
Error: Process completed with exit code 2.

see pull request: https://github.com/cncf/tag-security/pull/723

ultrasaurus commented 3 years ago

It looks like the issue may be with renaming a file:

git diff --name-only main $HEAD
NEW-MEMBERS.md
NEWMEMBERS.md
README.md
ci/spelling-config.json

maybe we should do this:

git diff --name-only --diff-filter=AM main $HEAD

ultrasaurus commented 3 years ago

That seems to have fixed it https://github.com/cncf/tag-security/pull/723 -- would be great if there were some way to spit out something more verbose in the case of an error like above

ultrasaurus commented 3 years ago

added - [ ] fix spelling errors in main branch

there aren't that many and seems like it might require some advanced knowledge to address them

Also, I believe this is currently blocking PRs. Should it still be warning till we fixup main?

ultrasaurus commented 3 years ago

There's been a request for commenting the spell check config file. I believe we can't do that in json, but could consider making a text file for the words to add to the dictionary that is converted to json in the script

see https://github.com/cncf/tag-security/pull/723#discussion_r662654525

ultrasaurus commented 3 years ago

another Error 123 on different PR https://github.com/cncf/tag-security/pull/725

I rebased on top of my other fix to confirm it was in-fact a different error (as seemed to be the case from visual inspection of the PR contents). Error still happened, as expected.

Running locally, I added set -euo pipefail so the script would error early, if applicable, which helped. Then with set -x to echo bash commands I can see there's an issue with git fetch:

$ make spelling
Running spelling...

+ npm install -g cspell
/usr/local/bin/cspell -> /usr/local/lib/node_modules/cspell/bin.js
+ cspell@5.6.6
added 132 packages from 60 contributors in 6.838s
+ git fetch origin main:main
The authenticity of host 'github.com (192.30.255.113)' can't be established.
RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'github.com,192.30.255.113' (RSA) to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
make: *** [spelling] Error 128

Since the git repo has public read access, I'm confused as to why this would be a problem. (The authenticity check always fails for me locally, even when spell check succeeds; so I didn't investigate that, just assumed that it was my local docker setup.)

cc @JohnHillegass @php-coder

JohnHillegass commented 3 years ago

So I think where this is messing up is we excluded that whole .github dir from cspell execution but grep is seeing markdown files changed there and passing forward to xargs. Since the first arg is not empty xargs is taking it and passing to cspell. Cspell since its been told to skip that path, returns a non-zero code and xargs throws the 123. To fix we will need to change ignorePaths "**/.github/**" to "**/.github/workflows/**" in the ci/spelling-config.json

Whew spell checking is fun 😆

JohnHillegass commented 3 years ago

not so sure about the 128 output above, did you possibly change your ssh keys recently?

ultrasaurus commented 3 years ago

Oh, I'll bet that the set -euo pipefail causes it to error, not just warn on the previously existing issue with my keys. That's unfortunate.

Does cspell have any kind of verbose error reporting we could turn on?

lumjjb commented 3 years ago

I think this is done ! So going to close it up! Running on CI smoothly! 🎉 🎉 Thanks again @JohnHillegass for our first (and very awesome) CI checks that have been super useful so far!

cncf / tag-security

markdown linting tools #311

706

705

704