elastic / eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
https://eland.readthedocs.io
Apache License 2.0
21 stars 99 forks source link

Remove support from sklearn-based ESGradientBoostingModel in 9.0 #731

Open pquentin opened 1 month ago

pquentin commented 1 month ago

With the move to Wolfi and the recent PyTorch/Transformers upgrade, we only have one CVE left in the Eland Docker image. The CVE itself is not that interesting and does not affect us, the point is more about not having any as part of our zero-CVE effort.

To fix it, we need to upgrade to scikit-learn 1.5 or above, while we're currently using 1.3 as there was a change in 1.4 to migrate from deviances to half losses. It's breaking us because we are importing those private, undocumented classes. Upgrading to scikit-learn isn't easy here - apparently there's at least a factor of 2 difference (hence the "Half" names) - but there could be other issues with the new version. That said, since we believe this feature has low usage, we would like to remove it: