datagouv / csv-detective

CSV inspection
45 stars 10 forks source link

Numpy subdependency version incompatibility #88

Closed bolinocroustibat closed 2 months ago

bolinocroustibat commented 3 months ago

When installing csv-detective 0.7.1, on which project hydra depends, it needs pandas 1.5.3 which depends on Numpy>=1.23.2 so it also automatically installs Numpy 2.0.1.

When running csv-detective in hydra tests, we get the following Numpy error: ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject It seems this error is related to an incompatible behaviour of Numpy >=2 with the current pandas code in csv-detective. More information here.

Three solutions to this (by order of preference): 1) Try to run csv-detective and hydra with pandas==2.2.2 (latest pandas as of 2024/07/30). If the code works, pin csv-detective dependencies to pandas<2.3.0,>=2.2.0 so that pandas is pinned to 2.2.x 2) If we run into the same error with pandas==2.2.2, fix the current code in csv-detective to be compatible with Numpy >= 2 3) If for some reason that's too difficult, pin csv-detective dependencies to pandas<2.2.0,>=2.1.4 while we figure this out, since pandas 2.1.4 has added the limitation for Numpy to be <2.

Pierlou commented 3 months ago

While I couldn't reproduce the error directly using hydra, it showed up in a recent PR I made on csv-detective. I upgraded the dependencies to make the tests pass, tell me if things are better with the latest dev version

bolinocroustibat commented 3 months ago

@Pierlou Great job, thank you! I just tested hydra without and with csv-detective 0.7.2.dev0 (using csv-detective = { git = "git@github.com:datagouv/csv-detective.git", branch = "master" }in pyproject.toml), and indeed the issue doesn't pop anymore when using your dev version.

Once your fix PR is merged and new version published on PyPi, one should review and merge this PR on Hydra.

One warning though:

Warning: The file chosen for install of requests 2.32.0 (requests-2.32.0-py3-none-any.whl) is yanked. Reason for being yanked: Yanked due to conflicts with CVE-2024-35195 mitigation

The warning should probably be solved if we were using requests 2.32.3 instead of 2.32.0 on csv-detective, I think.

Pierlou commented 3 months ago

The PR is already merged (and there's a dev version available directly on pypi). We can upgrade requests in a quick PR too if needed. Next csv-detective versio might not be 0.7.2 though, as we'll be releasing after this big PR is merged, we'll see and adapt consequently

bolinocroustibat commented 3 months ago

@Pierlou OK. Should we use the csv-detective dev version in hydra right now, or wait until the big PR is merged and adapt the dependency version number in hydra?

Pierlou commented 3 months ago

I'm fine with using the latest dev version in hydra for now (idk when the big PR will be merged, we're facing CI issues for now). I've fixed a bug that should make a few more resources analyzable so I'm happy to see how it evolves

bolinocroustibat commented 3 months ago

In that case, the PR has just been adapted, it passes CI tests: https://github.com/datagouv/hydra/pull/129 I guess it can be merged and we can closes this issue.

bolinocroustibat commented 2 months ago

I guess we can close that one?

Pierlou commented 2 months ago

Fine with me ✅