TRI-ML / dgp

ML Dataset Governance Policy for Autonomous Vehicle Datasets
https://tri-ml.github.io/dgp/
MIT License
94 stars 62 forks source link

feat: introduce pre-commit and configure most existing linters in it #118

Closed tk-woven closed 1 year ago

tk-woven commented 1 year ago

Explanation

This is toward #114 . This introduces the pre-commit tool which is configured to run all hooks that we previously ran locally via standard shell scripts.

This MR also runs the hooks on all files resulting in some being touched up.

SuperLinter (used in CI) still provides some checks that we do not cover here yet. This also updates the Dockerfile to symlink the pip installation to /usr/bin to support some tools that expect to find pip installed there.

Reading Guide

Most changes in this MR are source reflows from the linters. The exceptions are

  1. deleted .githooks
  2. introduced .pre-commit-config.yaml
  3. modified Dockerfile and Makefile
  4. updated some Docker/venv/getting-started docs
  5. run pre-commit as a CI step in .github/workflows/pre-merge.yml

Limitations

Setting up the hooks in the Docker environment is a bit awkward because the user first has to install git and then trust the DGP repo. We could do this as part of the Dockerfile, but that would mean installing git. I did not move to install git in this work.

The usual concern of people running a virtual environment with a Python version different than what we use in CI persists.


This change is Reviewable

tk-woven commented 1 year ago

Thanks, will merge now so everyone can use it! Tail-end work will have me replace the last linters from SuperLinter with hooks in pre-commit, and then we'll have single-entrypoint linting!