UCBoulder / oit-ds-tools-prefect

Common tasks and tools for use in Prefect Flows
0 stars 0 forks source link

Pandas conflict with numpy version >= 2.0.0 on Python 3.12 #52

Open ndrewwm opened 1 month ago

ndrewwm commented 1 month ago

This came up while trying to get Shealene's Python installation up and running. For new users running a recent version of Python (3.12.4), a numpy install at or above 2.0.0 will conflict with pandas. Our ucb_prefect_tools library pins the pandas version, but doesn't specify one for numpy. The result is an import error any time the user attempts to load the pandas library.

Looks like this problem has already been flagged in the numpy repo: https://github.com/numpy/numpy/issues/26710 A solution is to downgrade numpy to a version below 2.0.0, or to have the user install an older version of Python. Our onboarding guide doesn't have guidance on how their Python installation should be set up. How do we want to handle this kind of issue?

To reproduce the error:

mkdir ~/Documents/iss1
cd ~/Documents/iss1
python3 -m venv env
source env/bin/activate
pip install git+https://github.com/UCBoulder/oit-ds-tools-prefect@main
pip list

Package                    Version
-------------------------- -----------
...
numpy                      2.0.1
...
pandas                     1.5.3
import pandas as pd

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/anmo5608/Library/CloudStorage/OneDrive-UCB-O365/Documents/iss1/env/lib/python3.12/site-packages/pandas/__init__.py", line 22, in <module>
    from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: ignore # noqa:F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anmo5608/Library/CloudStorage/OneDrive-UCB-O365/Documents/iss1/env/lib/python3.12/site-packages/pandas/compat/__init__.py", line 18, in <module>
    from pandas.compat.numpy import (
  File "/Users/anmo5608/Library/CloudStorage/OneDrive-UCB-O365/Documents/iss1/env/lib/python3.12/site-packages/pandas/compat/numpy/__init__.py", line 4, in <module>
    from pandas.util.version import Version
  File "/Users/anmo5608/Library/CloudStorage/OneDrive-UCB-O365/Documents/iss1/env/lib/python3.12/site-packages/pandas/util/__init__.py", line 2, in <module>
    from pandas.util._decorators import (  # noqa:F401
  File "/Users/anmo5608/Library/CloudStorage/OneDrive-UCB-O365/Documents/iss1/env/lib/python3.12/site-packages/pandas/util/_decorators.py", line 14, in <module>
    from pandas._libs.properties import cache_readonly
  File "/Users/anmo5608/Library/CloudStorage/OneDrive-UCB-O365/Documents/iss1/env/lib/python3.12/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
  File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
jashbycu commented 1 month ago

According to the default-image/Dockerfile, our current flows are running Python 3.10, so one option would be to stick with that. However, I think the intent is that numpy should be < 2.0.0 just like pandas. I can proceed with applying that fix unless you think there could be an issue with that aproach.

ndrewwm commented 1 month ago

That sounds good to me! Avoiding the necessity of particular Python installations keeps things a bit simpler.

jashbycu commented 1 month ago

Wait I just checked the requirements file and it already has numpy<2.0.0. Were you using pip install -r default-image/requirements.txt?

ndrewwm commented 1 month ago

This issue is relevant for local installations of ucb_prefect_tools, not within the Prefect image. We'd need to update the setup.cfg in this repository.

jashbycu commented 1 month ago

Oh I see. I will update that. But also, even for local installs, the method I recommended is best so you get all the requirements to run prefect flows in one go.

jashbycu commented 1 month ago

Also now that I'm working on doing this re-install myself, it looks like you need to be on Python 3.10 or 3.11 to install these requirements.

ndrewwm commented 1 month ago

Ah, gotcha. I think we should update our Confluence docs with a description of these steps. Sounds like we should be recommending:

jashbycu commented 1 month ago

Okay thanks! I will add this as a to-do task for me to look at when I revamp the documentation.