Closed bolinocroustibat closed 2 months ago
While I couldn't reproduce the error directly using hydra, it showed up in a recent PR I made on csv-detective. I upgraded the dependencies to make the tests pass, tell me if things are better with the latest dev version
@Pierlou Great job, thank you!
I just tested hydra without and with csv-detective 0.7.2.dev0
(using csv-detective = { git = "git@github.com:datagouv/csv-detective.git", branch = "master" }
in pyproject.toml
), and indeed the issue doesn't pop anymore when using your dev version.
Once your fix PR is merged and new version published on PyPi, one should review and merge this PR on Hydra.
One warning though:
Warning: The file chosen for install of requests 2.32.0 (requests-2.32.0-py3-none-any.whl) is yanked. Reason for being yanked: Yanked due to conflicts with CVE-2024-35195 mitigation
The warning should probably be solved if we were using requests 2.32.3
instead of 2.32.0
on csv-detective
, I think.
The PR is already merged (and there's a dev version available directly on pypi).
We can upgrade requests
in a quick PR too if needed.
Next csv-detective
versio might not be 0.7.2
though, as we'll be releasing after this big PR is merged, we'll see and adapt consequently
@Pierlou OK. Should we use the csv-detective dev version in hydra right now, or wait until the big PR is merged and adapt the dependency version number in hydra?
I'm fine with using the latest dev version in hydra for now (idk when the big PR will be merged, we're facing CI issues for now). I've fixed a bug that should make a few more resources analyzable so I'm happy to see how it evolves
In that case, the PR has just been adapted, it passes CI tests: https://github.com/datagouv/hydra/pull/129 I guess it can be merged and we can closes this issue.
I guess we can close that one?
Fine with me ✅
When installing csv-detective 0.7.1, on which project hydra depends, it needs pandas 1.5.3 which depends on Numpy>=1.23.2 so it also automatically installs Numpy 2.0.1.
When running csv-detective in hydra tests, we get the following Numpy error:
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
It seems this error is related to an incompatible behaviour of Numpy >=2 with the current pandas code in csv-detective. More information here.Three solutions to this (by order of preference): 1) Try to run csv-detective and hydra with
pandas==2.2.2
(latest pandas as of 2024/07/30). If the code works, pin csv-detective dependencies topandas<2.3.0,>=2.2.0
so that pandas is pinned to 2.2.x 2) If we run into the same error withpandas==2.2.2
, fix the current code in csv-detective to be compatible with Numpy >= 2 3) If for some reason that's too difficult, pin csv-detective dependencies topandas<2.2.0,>=2.1.4
while we figure this out, since pandas 2.1.4 has added the limitation for Numpy to be <2.