Closed amotl closed 7 months ago
scikit-learn 1.4.2 was released on Apr 9, 2024. Is it related?
-- https://pypi.org/project/scikit-learn/1.4.2/#history
If it is, the reason why the corresponding CI job did not fail before more prominently, on the nightly runs to validate functionality, is most probably because dependencies are configured to be cached when the local requirements
files do not change.
In this case, the nightly CI jobs do not catch updates to transitive dependencies not enumerated locally, and thus, do not hold up to their promise to give you a constant piece of mind in "on stage" situations. In this spirit, what is reflected on the Build Status page, might not convey the whole truth, and I am sad about it.
/cc @marijaselakovic, @ckurze, @hammerhead, @simonprickett
I am able to confirm this error on my workstation, using Python 3.11.
source .venv/bin/activate
pip install --upgrade scikit-learn
cd topic/timeseries
pytest -k timeseries-anomaly-detection.ipynb
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by SimpleImputer.
However, I am also seeing this one, where the second one might actually be a follow-up error.
ProgrammingError: (crate.client.exceptions.ProgrammingError) RelationAlreadyExists[Relation 'notebook.machine_data' already exists.]
[SQL: CREATE TABLE machine_data ("timestamp" TIMESTAMP, "value" DOUBLE PRECISION)]
On behalf of GH-425, the RelationAlreadyExists
error has been fixed with fdb91dd703, but, despite downgrading scikit-learn using d244f4345b6, the array shape error is still there, but only on Python 3.10 now, and only on CI. On my workstation, software tests also succeed using Python 3.10.13.
-- https://github.com/crate/cratedb-examples/actions/runs/8744363838/job/23997048918?pr=425#step:6:951
Taking a closer look, ValueError: Found array with 0 sample(s)
may also convey it is related to CrateDB's eventual consistency, so ab42144174b adds a relevant REFRESH TABLE "tablename";
SQL statement, in order to synchronize writes.
Indeed, it apparently has been the missing REFRESH TABLE
statement, so writes have not been synchronized, so the result was not visible by subsequent querying statements. Apparently, it is not related to scikit-learn 1.4.2 at all. GH-425 will improve the situation. d244f4345b62 has been removed again.
Problem
The
timeseries-anomaly-detection.ipynb
notebook errors out, both on Python 3.10 and 3.11 ^1.Observations
Because it happens on both versions of Python, it is most probably unrelated to the change per se where it started tripping.
Thoughts
Most probably another dependency flaw?