dsgibbons / shap

A game theoretic approach to explain the output of any machine learning model.
https://shap-community.readthedocs.io/en/latest/
MIT License
25 stars 5 forks source link

Adopt black and ruff #21

Closed dsgibbons closed 1 year ago

dsgibbons commented 1 year ago

Following PRs #2594 and #2595, consider using black and ruff to standardize code style. We may disable many ruff rules to get started. For example, I don't fancy adding static type checkin to the entire project yet. mypy would be nice to add one day, and may help with some of SHAP's rough edges.

connortann commented 1 year ago

I think this is a great idea in principle! I love black and ruff, and they could be an enormous help in improving code quality.

One concern though is that reformatting with black might make it very difficult to merge any changes between forks, as the diff would be enormous and merge conflicts would likely arise everywhere. It would be great if the original repo were already formatted with black, but we have to be pragmatic.

I think keeping the diff between forks minimal & understandable is rather important for us, as we want to make it easy to see what has changed in this fork, and we want to be able to merge PRs that were originally raised on the original repo. Arguably that is more important at the moment than having a nice code style.

So... should we start with just Ruff but not Black? We should certainly fix the logical issues, e.g. == vs is and all that.

thatlittleboy commented 1 year ago

I'll chip in here as well. Here are my thoughts:

  1. Agree that we don't need type checking / strict type hints at this stage.
  2. 100% agree that we shouldn't introduce black into CI/CD at this stage. I propose that we do that once we are more or less done with porting the PRs and fixing the more pressing issues. That would make our lives much easier.
  3. On ruff / linting in general, agree that we need it. And that while we are in this early stages, a minimal set of rules running in CI is sufficient (as to what rules these should constitute, I think we need a discussion on that; in #25 ). We should definitely increase the scope / rules along with introducing black later on.
  4. But I can also see an argument for wanting to introduce linting ASAP. I'll give a simple example to illustrate my point. Suppose we've decided that we will introduce the C4 rules in the future, e.g. C400. It would save us some effort in the future if we are conscious now NOT to write code that violates C400. I think a happy medium is while keeping CI ruff running the bare minimum checks, we can set a stricter set of rules on our own local development environment. There's no way to enforce this, of course, but it's what I will be doing with my PRs.
connortann commented 1 year ago

I agree wholeheartedly, @thatlittleboy .

On your point (4) about linting new changes to a higher standard, one possibility is given in the ruff docs:

When enabling a new rule on an existing codebase, you may want to ignore all existing violations of that rule and instead focus on enforcing it going forward.

Ruff enables this workflow via the --add-noqa flag, which will adds a # noqa directive to each line based on its existing violations.

If we went this route, we'd need to distingush somehow between existing # noqa directives that have were deliberately added to suppress false positive warnings, versus any new # noqa directives that are temporary placeholders and may be masking a genuine error.

dsgibbons commented 1 year ago

Thank you for the great feedback @thatlittleboy and @connortann. I agree with your points. I'm definitely in no rush to add comprehensive linting and formatting. I mainly raised this issue so I could cross #2594 and #2595 off from #2.

I like the idea of only applying ruff to new code. I'm not entirely sure about ruff --add-noqa though. In addition to the false positive situation mentioned by @connortann, won't all of those additional # noqa directives clog up the diff w.r.t slundberg/shap?

For now, I'm happy to discuss a minimal set of ruff rules that should be applied to the entire codebase inside #25

connortann commented 1 year ago

It seems like we've got consensus to avoid Black for now, and to run Ruff on the whole codebase. I think we can close this issue now that #25 has been merged.

As a longer-term objective, it would be good to get a greater set of Ruff checks passing.