apache / sedona

A cluster computing framework for processing large-scale geospatial data
https://sedona.apache.org/
Apache License 2.0
1.88k stars 662 forks source link

Incompatible numpy version in Docker container #1572

Closed mvaaltola closed 4 weeks ago

mvaaltola commented 4 weeks ago

Expected behavior

Running import pandas as pd inside apache/sedona:1.6.1 Jupyter notebook environment should succeed.

Actual behavior

Import fails, apparently due to numpy and pandas versions being incompatible.

import pandas as pd
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 import pandas as pd

File /usr/local/lib/python3.10/dist-packages/pandas/__init__.py:22
     19 del _hard_dependencies, _dependency, _missing_dependencies
     21 # numpy compat
---> 22 from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: ignore # noqa:F401
     24 try:
     25     from pandas._libs import hashtable as _hashtable, lib as _lib, tslib as _tslib

File /usr/local/lib/python3.10/dist-packages/pandas/compat/__init__.py:18
     15 from typing import TYPE_CHECKING
     17 from pandas._typing import F
---> 18 from pandas.compat.numpy import (
     19     is_numpy_dev,
     20     np_version_under1p21,
     21 )
     22 from pandas.compat.pyarrow import (
     23     pa_version_under1p01,
     24     pa_version_under2p0,
   (...)
     31     pa_version_under9p0,
     32 )
     34 if TYPE_CHECKING:

File /usr/local/lib/python3.10/dist-packages/pandas/compat/numpy/__init__.py:4
      1 """ support numpy compatibility across versions """
      2 import numpy as np
----> 4 from pandas.util.version import Version
      6 # numpy versioning
      7 _np_version = np.__version__

File /usr/local/lib/python3.10/dist-packages/pandas/util/__init__.py:2
      1 # pyright: reportUnusedImport = false
----> 2 from pandas.util._decorators import (  # noqa:F401
      3     Appender,
      4     Substitution,
      5     cache_readonly,
      6 )
      8 from pandas.core.util.hashing import (  # noqa:F401
      9     hash_array,
     10     hash_pandas_object,
     11 )
     14 def __getattr__(name):

File /usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py:14
      6 from typing import (
      7     Any,
      8     Callable,
      9     Mapping,
     10     cast,
     11 )
     12 import warnings
---> 14 from pandas._libs.properties import cache_readonly
     15 from pandas._typing import (
     16     F,
     17     T,
     18 )
     19 from pandas.util._exceptions import find_stack_level

File /usr/local/lib/python3.10/dist-packages/pandas/_libs/__init__.py:13
      1 __all__ = [
      2     "NaT",
      3     "NaTType",
   (...)
      9     "Interval",
     10 ]
---> 13 from pandas._libs.interval import Interval
     14 from pandas._libs.tslibs import (
     15     NaT,
     16     NaTType,
   (...)
     21     iNaT,
     22 )

File /usr/local/lib/python3.10/dist-packages/pandas/_libs/interval.pyx:1, in init pandas._libs.interval()

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Steps to reproduce the problem

  1. Run Docker container: docker run --rm -p 8888:8888 -p 8080:8080 -p 8081:8081 -p 4040:4040 apache/sedona:1.6.1
  2. Open http://localhost:8888
  3. Create a new notebook, add cell import pandas as pd and run

Settings

Sedona version = 1.6.1

Image version = 1.6.1 or latest (sha256:992f9645635abd1df6adb6fbfb3c08d716494b25d3ee445413c4f6fd46eeff28)

Apache Spark version = 3.4.1

API type = Python

Python version = 3.10.12

Environment = Standalone / Docker

github-actions[bot] commented 4 weeks ago

Thank you for your interest in Apache Sedona! We appreciate you opening your first issue. Contributions like yours help make Apache Sedona better.

mvaaltola commented 4 weeks ago

Related to https://github.com/apache/sedona/pull/1478. Running pip install "numpy<2" inside the container resolves the issue.