SlideRuleEarth / sliderule

Server and client framework for on-demand science data processing in the cloud
https://slideruleearth.io
Other
26 stars 11 forks source link

Example of `pre-commit` with `codespell` #338

Closed rhugonnet closed 9 months ago

rhugonnet commented 9 months ago

Re @jpswinski!

Here is an example of using pre-commit with the codespell hook! It is just adding a workflow file in .github and a config file at the root. The associated run in CI for this PR will show the typos found (Output here: https://github.com/ICESat2-SlideRule/sliderule/actions/runs/6385765295/job/17331180671?pr=338).

The idea behind pre-commit is to run static checks before anything else, and apply this automatically just before commit. It mostly serves to do static code checking and formatting (=linting), but also spell checking or environment setup (for instance, some packages like pandas produce their requirements.txt for pip using their environment.yml from conda to ensure consistency! they run a script for this in pre-commit: https://github.com/pandas-dev/pandas/blob/main/.pre-commit-config.yaml#L267).

Locally, you can program it to run before commits, or run it manually this way from the package root: pre-commit run --all after installing pre-commit (available on pip or conda)

It is a great way to make a package extra-robust to errors, as the checks cover all code without requiring to be dynamically checked in the tests/, which is especially relevant to type checking. In Python, many projects use flake8+isort for Python syntax checking (https://github.com/pycqa/flake8 ), black for formatting (https://github.com/psf/black) and mypy for type checking (https://github.com/python/mypy). It's a big step to use mypy, but it saves so much effort down the line. MyPy now has a NumPy plugin, so it knows what array size + dtype has to be passed as input/output of any function, and detects that just from the code, really powerful! But mypy requires typing when coding in Python:

def my_func(input1: float, input2: list[str], ...)

The fully deterministic linters will apply modifications to the scripts automatically (formatters), and other will simply detect and propose changes (spell check, type check, etc).

I looked at some C++ projects, as I'm not too familiar for the hooks used there, and it looks like several projects are using clang-format for the C++ side of things, see example for GDAL here: https://github.com/OSGeo/gdal/blob/master/.pre-commit-config.yaml#L30, with a conf file here: https://github.com/OSGeo/gdal/blob/master/.clang-format. Maybe you know this tool?

A complete list of pre-commit hooks is available here: https://pre-commit.com/hooks.html.

It would probably be a lot to start with all at once, especially the aggressive hooks (mypy or black, for instance), but the more "moderate" ones shouldn't be too much work!

rhugonnet commented 9 months ago

And here for codespell, it looks like we could add parms to --ignore-words-list!