BiomedSciAI / causallib

A Python package for modular causal inference analysis and model evaluations
Apache License 2.0
728 stars 97 forks source link

Fix support for scikit-learn>=1.2.0 ̵a̵n̵d̵ ̵N̵u̵m̵p̵y̵=̵1̵.̵2̵4̵.̵0̵ #52

Closed ehudkr closed 3 months ago

ehudkr commented 1 year ago

Scikit-learn version 1.2.0 enforces two API changes that currently break tests.

  1. LinearRegression no longer supports the normalize keyword argument, which some of the tests use. Fix should theoretically be rather simple since it is just replacing LinearRegression with a Pipeline object with a StandardScaler preprocessing step.
  2. Scikit-learn now enforces strict column name restrictions. First, all columns must be of the same type, and second, column names should match between fit and predict. This might require a solution of larger breadth. The first part will require a "safe join" that is column-name-type aware and replace all the instances we join covariate X with treatment assignment a. The second part require to validate column-names are consistent/preserved when new data is inputted. Which might be mostly in the time-pooled survival models where a time range is artificially created and placed as a predictor.

A slightly more minor exception was also raised with Numpy v1.24.0. Throwing a TypeError: ufunc 'isfinite' not supported for the input types exception when generating calibration plots calls matplotlib's fill_between call that fails. Need to dig deeper into that and whether that's a causallib problem (providing bad fill values) or some external matplotlib-numpy mismatch.

In the meantime, PR https://github.com/BiomedSciAI/causallib/pull/50 limited the allowed dependency versions.

ehudkr commented 1 year ago

The numpy 1.24.0 bug indeed seems to be a matplotlib problem https://github.com/matplotlib/matplotlib/issues/24106, which was fixed https://github.com/matplotlib/matplotlib/pull/24115 and released in matplotlib v3.6.1, so updating matplotlib should allow updating numpy too.

ehudkr commented 3 months ago

It appears lacking support for scikit-learn>=1.2 now limits causallib from being used with Python>=3.12, as scikit-learn itself requires version >=1.12 to be installed on Python 3.12. See https://github.com/BiomedSciAI/causallib/issues/70.

ehudkr commented 3 months ago

Closed by https://github.com/BiomedSciAI/causallib/pull/72