dsgibbons / shap

A game theoretic approach to explain the output of any machine learning model.
https://shap-community.readthedocs.io/en/latest/
MIT License
25 stars 5 forks source link

Drop support for python 3.7, adopt NEP 29? #73

Closed connortann closed 1 year ago

connortann commented 1 year ago

We should consider adopting NEP 29 β€” Recommend Python and NumPy version support as a community policy standard

As explained in the NEP, there are several benefits of deprecating according to a predictable schedule, both for downstream users of the package and also for maintainers. The recommended drop schedule for python is:

On Dec 26, 2021 drop support for Python 3.7 (initially released on Jun 27, 2018)
On Apr 14, 2023 drop support for Python 3.8 (initially released on Oct 14, 2019)
On Apr 05, 2024 drop support for Python 3.9 (initially released on Oct 05, 2020)
On Apr 04, 2025 drop support for Python 3.10 (initially released on Oct 04, 2021)
On Apr 24, 2026 drop support for Python 3.11 (initially released on Oct 24, 2022)

This would mean dropping support for 3.7 and 3.8. We could drop support straight away, or perhaps go through a deprecation cycle (once we start making regular releases).

EDIT: A checklist of some related things:

thatlittleboy commented 1 year ago

Agreed. we should at least make 1 release to fix all the bugs and warnings, then the next release should drop python 3.7 and 3.8, depending on the timeline

connortann commented 1 year ago

I think there's a case to make that we should drop python 3.7 from the get-go, as that will make it a fair bit easier to fix the current set of bugs and warnings. As an example, some of the failing tests in the master branch only fail on 3.7. Would you be up for that?

There are of course several previous releases of shap that do support 3.7 and remain available, so I don't think there's any real benefit to users of a new release that supports such an old version. On the flip side, one of the compelling advantages of this new fork is that it could support new versions such as 3.11.

thatlittleboy commented 1 year ago

I took a look at the waterfall plot test failure, the plot rmse is 2.xxx, just slightly higher than the acceptable threshold of 2. I'll take a deeper look when i get home, but it's a trivial fix. Did you see any other tests that only fail on py3.7?

I think the main point of the fork existing, first and foremost, is in fixing the most annoying bugs and warnings on slundberg's shap package.

I don't think there's any real benefit to users of a new release that supports such an old version.

Don't really agree with this, tbh. Many users are stuck on older Python versions (usually outside of their own control), I think it's a pity if we leave them out from the work that we did here.

To be clear, I'ld love dropping py3.7 and py3.8, it means we can do better typing in the code base, for one. But there's no rush to do it now.

connortann commented 1 year ago

Thanks for your comments, I appreciate the discussion.

I agree with you that fixing the most annoying bugs and warnings is the priority. So, I think the questions are:

On A), I think dropping older versions is indeed helpful, especially for fixing tests and warnings. Staying compatible with over 4 years of python releases and corresponding versions of all 3rd party libs is tough! The code base currently has some rather ugly try/except clauses to handle changing syntaxes such the tensorflow.keras imports, which is the source of many warnings.

On B), I'm doubtful if users on older python versions have much of a need for new releases. These environments will necessarily have older versions of numpy/tensorflow, and consequently they won't be experience the bugs & deprecation warnings that are thrown in more modern environments. It's true they will miss out on any new features; however, this is true of any package that follows NEP 29.

Many users are stuck on older Python versions (usually outside of their own control), I think it's a pity if we leave them out from the work that we did here.

Whilst I agree in general, is there anything different about shap that means we should deviate from the accepted community standard for scientific python libs? The NEP makes an argument against adopting an ad-hoc support window:

A project could, on every release, evaluate whether to increase the minimum version of Python supported. As a major downside, an ad-hoc approach makes it hard for downstream users to predict what the future minimum versions will be. As there is no objective threshold to when the minimum version should be dropped, it is easy for these version support discussions to devolve into bike shedding and acrimony.

thatlittleboy commented 1 year ago

No ill-intention meant here, but I think perhaps we just have a fundamentally different view on what the (main) purpose of this next release of shap is about.


A) I'm with you, that it's tough to maintain support across multiple python versions, I'm not advocating for us to maintain support for py3.7 and py3.8 forever. :) But practically speaking, is there any main blocker for continuing support for python 3.7 and python 3.8 for just 1 more release cycle? I'm willing to be the one to fix the bugs and happy to maintain some "hacky workaround" for now (it's not like the rest of the code base in its current form is any prettier, tbh πŸ˜…).

B) We didn't only fix bugs or warnings related to package / python versions. We also fixed bugs that existed for years, and slundberg didn't merge the PRs for those (not putting the blame on anyone, I'm just stating facts). A non-exhaustive list: #58, #52, #36, ... And more that I intend to fix, e.g. #14, #66, and PR slundberg#2839, ...

I just don't see why we shouldn't port these fixes for those still on python 3.7 and python 3.8. If there are huge practical problems posed by A), then sure, we can have a more in-depth discussion on the pros and cons. As it stands, it doesn't seem to be a good trade-off for me (to drop support for older python versions and leave those on older python versions with a buggy shap package).

is there anything different about shap that means we should deviate from the accepted community standard for scientific python libs?

Yes, in that I believe the purpose of this fork's existence (initially) is to fix the issues that others have encountered for years and didn't get the necessary support for in the main repo. After our initial release, I'm more than happy to stick to NEP as per any ordinary library.

connortann commented 1 year ago

Thanks for the well thought-out reply. You make good points, and on balance I'm happy to get on board with your recommendation to continue with 3.7 and 3.8 for at least one more release, before moving towards the NEP in future πŸ™‚

We could advertise this on the changelog perhaps, that the next release is the last one that will support python 3.7?

it's not like the rest of the code base in its current form is any prettier

πŸ˜… very fair point!

thatlittleboy commented 1 year ago

We could advertise this on the changelog perhaps, that the next release is the last one that will support python 3.7?

Yup, agreed!

dsgibbons commented 1 year ago

A Python 3.7 release would be nice, especially given that its test suite works now. I agree with the overall conclusion to follow NEP 29 for later releases.

connortann commented 1 year ago

Closing as issue moved to main fork