Should we use pre-commit?

ylep commented 2 years ago

Following an off-topic discussion started in https://github.com/brainvisa/casa-distro/issues/301:

To my opinion, pre-commit is a bad experience for developers. If you make coding style mistakes you cannot go on and fix it later,

@sapetnioc If you find coding style checks to be too annoying, we could disable coding style checks from pre-commit. pre-commit also has code correctness checks, which IMHO are very important to not disable. For instance, it would have helped to prevent these syntax errors in JSON files in capsul, or these syntax errors and uses of undefined variables.

I believe tests are a better place than pre-commit

But experience shows that we do not look at the tests: the above syntax errors have made the tests fail on master for more than 2 months, yet nobody noticed and went to fix it.

And I also agree that it would be interesting to run such a test on the complete source code outside of precommit, too

@denisri actually it is already the case: in casa-distro a complete check is done during tox tests (which calls pre-commit run --all-files). In capsul the checks are done by https://pre-commit.ci/ and reported along tests as part of the GitHub commit status. However, that does not prevent the inclusion of failing code in master.

In the end this is a philosophical decision that depends how we want to deal with code quality. I personally believe in a “fail-fast” approach, where issues are reported and fixed as early as possible, and avoid erroneous code creeping into the repository. pre-commit is a great tool to help with this, but its adoption must be a collective decision. If a majority of active developers agree that it is an annoyance, we should as well stop using it.

Maybe it should be a topic for a future Thursday's meeting?

DimitriPapadopoulos commented 2 years ago

But experience shows that we do not look at the tests: the above syntax errors have made the tests fail on master for more than 2 months, yet nobody noticed and went to fix it.

I guess this happens when pushing commits directly, instead of opening merge requests that require some minimal review.

However, that does not prevent the inclusion of failing code in master.

That shouldn't be possible.

Hboni commented 2 years ago

Like @ylep said, I think fixing issues as early as possible can benefit the code, and avoid keeping errors in the repository. Pre-commit can save time as it can point out issues on code that we are editing, to avoid going back on the same code months later to fix those issues. I never really configured pre-commit (I would like to do so for projects like https://github.com/cati-neuroimaging/deidentification), but is it possible to trigger pre-commit only on modified files/lines of code?

ylep commented 2 years ago

However, that does not prevent the inclusion of failing code in master.

That shouldn't be possible.

Well, if we want to prevent this, we have to enable branch protection on master, check “Require status checks to pass before merging”, and also check “Include administrators” (by default repository admins are exempted from the restrictions).

Such a setup makes it impossible to commit directly to master, which is a good thing for code quality, but also a burden. I personally think that we should do it for infrastructure-critical projects, such as casa-distro and brainvisa-cmake, which can be a mess to correct if broken code makes it to the every developer's repositories. For other projects, it is debatable.

ylep commented 2 years ago

I never really configured pre-commit (I would like to do so for projects like https://github.com/cati-neuroimaging/deidentification), but is it possible to trigger pre-commit only on modified files/lines of code?

@Hboni By default, pre-commit runs only on modified files at commit time. If you want to run it on the whole repository, you can use pre-commit run --all-files. As far as I know there is no way to run it only on modified lines (it is not really possible, e.g. flake8 needs the whole file to detect Python errors).

My strategy for enabling pre-commit in multiple brainvisa projects has been:

create a configuration file (.pre-commit-config.yaml) with minimal checks, without committing it yet
fix errors in the whole repository so that pre-commit run --all-files gives no errors
commit .pre-commit-config.yaml
progressively add stricter checks to .pre-commit-config.yaml, every time fixing the errors with pre-commit run --all-files

denisri commented 2 years ago

My personal opinion is that precommit is useful, thus I would keep using it. However there are cases it doesn't behave as expected:

if an "old" unchecked source file is edited, then the whole file will be checked, and all previous problems will need to be fixed by the developer, even when it's not his current modifs which caused them. In some cases the developer will need to fix hundreds of small glitches just to commit a very local change. Some will just give up in such a case.
we have never defined the level of exigence we want for our code. For instance to which extent we should follow PEP8 for python code. I know we don't all agree on them, thus some developers will disagree on precommit complains. This is rather our policy to be defined, but up to now we have not.
some precommit requirements may be felt as a burden to less hardcore developers, such as the need to install both python2 and python3. Here again it's rather a matter of defining our policy.
precommit sometimes fails and makes mistakes, and prevents valid code to be submitted.

So we must define, at least, what to do in such situations. I agree we should discuss this on a thursday meeting, and write down a few guidelines/rules.

Details:

these syntax errors in JSON files in capsul

This is actually not really a precommit or test problem: tests do pass actually here, but these ".json" files are not really json: they are read using a yaml reader, in order to allow comments, thus their syntax is less strict than json. The mistake is that we should not have named them .json at the first place.

But experience shows that we do not look at the tests

yes, and this is actually a real problem. If we keep this way I'd even say we're dead, in the long run. One of the problems here is that some tests are not completely reproducible (Morphologist), for reasons we have failed to determine... This is a "big topic"...

However, that does not prevent the inclusion of failing code in master.

That shouldn't be possible.

Unfortunately it's sometimes necessary: as said before precommit incorrectly sometimes reports errors, and we are forced to bypass it:

I had the case recently in capsul: precommit had auto-updated, and was simply broken for the version of python we are all using (3.6). I had to force-downgrade it but it took several weeks/months to understand why it was broken.
in capsul v3 (pydantic_controller branch) precommit reports errors in valid dataclass/pydantic attributes. I guess it does not parse correctly their syntax. So here too, we have no choice but bypass precommit, at least for now.

If we block commits/pushs in these cases, we can simply not develop our tools.

All this being said, I'm in favor of using it, but without too strict branch protection.

Hboni commented 2 years ago

Just to mention, I found this python module, darker which allows to only format modified lines of code, which can help to not check a complete file. But this tool uses Black for code reformating and isort to sort imports. It may need more configuration to not be too strict, but it can help.

DimitriPapadopoulos commented 2 years ago

As far as I know, Black cannot be configured, someone had to fork it to enforce single quotes.

brainvisa / brainvisa.github.io

Should we use pre-commit? #110