TRI-ML / dgp

ML Dataset Governance Policy for Autonomous Vehicle Datasets
https://tri-ml.github.io/dgp/
MIT License
93 stars 63 forks source link

Difficult for users to locally run linters used in CI #114

Closed tk-woven closed 2 years ago

tk-woven commented 2 years ago

Description

DGP features various linters: pylint, YAPF, SuperLinter, and even a commit linter (CI-only?). To run all linters, users have to do so manually or attempt a commit locally to trigger the git hooks.

Improvement

While this introduces yet another tool, let's use something like pre-commit to manage our various linting tools. pre-commit provides a common entrypoint to containerized linting, making it easy to run the same checks that we use in CI locally so that people can fix linting issues before going to CI. We could configure and use (most of?) the same linters we currently use, and we'd remove .githooks/ entirely.

tk-woven commented 2 years ago

I'm working on integrating pre-commit right now but have found that pylint finds many violations on master. Perhaps most developers don't have the git hooks installed. pylint only seems to be run on changed files by SuperLinter (see the pre-merge.yml workflow file). For pre-commit, it's easiest to run all hooks on all files in CI, though. So, for the first pass toward this, I'll integrate pre-commit less pylint via SuperLinter. In other words, we'll consolidate our various lint entrypoints down to 2: pre-commit and SuperLinter.

Also, we need to continue using the standalone commitlint action in CI. While a hook exists for pre-commit, it cannot easily run on commits already made. For this reason, we'll use the commitlint hook for local commit linting and the action for commit linting in CI. At least they will share the same configuration (.commitlintrc.yml).

tk-woven commented 2 years ago

SuperLinter has a ton of linters (see VALIDATE_* environment variables), and DGP can use quite a few of them. By my estimate, DGP would use

VALIDATE_BASH
VALIDATE_GITHUB_ACTIONS
VALIDATE_GITLEAKS
VALIDATE_HTML
VALIDATE_JSON
VALIDATE_MARKDOWN
VALIDATE_NATURAL_LANGUAGE
VALIDATE_PROTOBUF
VALIDATE_PYTHON_PYLINT
VALIDATE_SHELL_SHELLFMT
VALIDATE_YAML

It's not clear to me which tools these options might use behind the scenes with the exception of ones where the tools are in the name. The most common one for us is pylint.

Moving away from pylint will require us to replace the above list with any linters we want to maintain coverage from and also to fix pylint issues so that we can run pylint on the whole codebase in CI via pre-commit.

tk-woven commented 2 years ago

I've found that there are quite a few latent pylint issues on master. I will need to fix these before replacing the tooling entrypoint with pre-commit since we'll run on all files in CI.

tk-woven commented 2 years ago

After #122 , the only remaining hooks to move are markdownlint and yamllint. I will do this in separate work because there are existing linting errors on master. SuperLinter and our configuration of it suggests that we use these linters, but since they have not triggered errors in our CI, it's unclear to me whether we actually use them. Nonetheless, to maintain the current level of coverage, I will add these hooks and touch-up the files in violation.

Also, SuperLinter provides some hooks that we simply can't use (easily) in pre-commit, and so I am skipping them, with justifications.

actionlint: This validates GitHub Action (YAML) files. To use this hook, our users have to manually install the tool (if outside of Docker), which is annoying. So, I skip this. We don't lose much by skipping it because we don't modify our Actions definitions frequently, and when we do, we'll get "free" validation when we push to GH. Of course, the iteration speed is very slow since we have to push to GH to trigger their parser.

Natural language validation: there is no existing pre-commit hook for this that I could find. I don't think this hook is terribly important. The most important part about natural language is that its meaning is generally understandable, and we enforce this in code review.

tk-woven commented 2 years ago

The last work toward this was merged today. DGP linting is done entirely through pre-commit.