databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
199 stars 107 forks source link

Add support for Pandas >2.0 #726

Open alexeyegorov opened 2 weeks ago

alexeyegorov commented 2 weeks ago

Describe the bug

Since Numpy released its latest version 2.0, it is not compatible with an older version of Pandas. However, dbt-databricks in version 1.8.3 only supports pandas up to version 2.0.

Workaround: fix numpy version to 1.26.4 (latest before 2.0).

Steps To Reproduce

  1. For my devcontainer setup, I use requirements.txt with only few entries:
    dbt-databricks==1.8.3
    sqlfluff
    sqlfluff-templater-dbt
  2. Install the above dependencies.
  3. Run dbt deps
  4. Try to run any dbt command like dbt compile

Expected behavior

Successfull dbt run.

Screenshots and log output

The outcome of the commans:

Bildschirmfoto 2024-07-05 um 15 07 16

DBT now has installed the packages.

But it fails in any other execution (in this case, it is dbt compile):

Bildschirmfoto 2024-07-05 um 15 07 32

Quote from the logs:

13:07:21 Running with dbt=1.8.3

A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last): File "/usr/local/bin/dbt", line 8, in sys.exit(cli()) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, *kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/main.py", line 148, in wrapper return func(args, kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 138, in wrapper result, success = func(*args, kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 101, in wrapper return func(*args, *kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 215, in wrapper profile = load_profile(flags.PROJECT_DIR, flags.VARS, flags.PROFILE, flags.TARGET, threads) File "/usr/local/lib/python3.9/site-packages/dbt/config/runtime.py", line 71, in load_profile profile = Profile.render( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 403, in render return cls.from_raw_profiles( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 369, in from_raw_profiles return cls.from_raw_profile_info( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 325, in from_raw_profile_info credentials: Credentials = cls._credentials_from_profile( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 149, in _credentials_from_profile cls = load_plugin(typename) File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 239, in load_plugin return FACTORY.load_plugin(name) File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 68, in load_plugin mod: Any = import_module("." + name, "dbt.adapters") File "/usr/local/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/init.py", line 3, in from dbt.adapters.databricks.connections import DatabricksConnectionManager # noqa File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/connections.py", line 26, in from databricks.sql.client import Connection as DatabricksSQLConnection File "/usr/local/lib/python3.9/site-packages/databricks/sql/client.py", line 3, in import pandas File "/usr/local/lib/python3.9/site-packages/pandas/init.py", line 23, in from pandas.compat import ( File "/usr/local/lib/python3.9/site-packages/pandas/compat/init.py", line 27, in from pandas.compat.pyarrow import ( File "/usr/local/lib/python3.9/site-packages/pandas/compat/pyarrow.py", line 8, in import pyarrow as pa File "/usr/local/lib/python3.9/site-packages/pyarrow/init.py", line 65, in import pyarrow.lib as _lib AttributeError: _ARRAY_API not found 13:07:21 Encountered an error: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject 13:07:21 Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 138, in wrapper result, success = func(args, kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 101, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 215, in wrapper profile = load_profile(flags.PROJECT_DIR, flags.VARS, flags.PROFILE, flags.TARGET, threads) File "/usr/local/lib/python3.9/site-packages/dbt/config/runtime.py", line 71, in load_profile profile = Profile.render( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 403, in render return cls.from_raw_profiles( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 369, in from_raw_profiles return cls.from_raw_profile_info( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 325, in from_raw_profile_info credentials: Credentials = cls._credentials_from_profile( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 149, in _credentials_from_profile cls = load_plugin(typename) File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 239, in load_plugin return FACTORY.load_plugin(name) File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 68, in load_plugin mod: Any = import_module("." + name, "dbt.adapters") File "/usr/local/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1030, in _gcd_import File "", line 1007, in _find_and_load File "", line 986, in _find_and_load_unlocked File "", line 680, in _load_unlocked File "", line 850, in exec_module File "", line 228, in _call_with_frames_removed File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/init.py", line 3, in from dbt.adapters.databricks.connections import DatabricksConnectionManager # noqa File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/connections.py", line 26, in from databricks.sql.client import Connection as DatabricksSQLConnection File "/usr/local/lib/python3.9/site-packages/databricks/sql/client.py", line 3, in import pandas File "/usr/local/lib/python3.9/site-packages/pandas/init.py", line 46, in from pandas.core.api import ( File "/usr/local/lib/python3.9/site-packages/pandas/core/api.py", line 1, in from pandas._libs import ( File "/usr/local/lib/python3.9/site-packages/pandas/_libs/init.py", line 18, in from pandas._libs.interval import Interval File "interval.pyx", line 1, in init pandas._libs.interval ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

System information

The output of dbt --version:

<output goes here>

The operating system you're using:

The output of python --version:

Additional context

benc-db commented 2 weeks ago

Our upstream dependencies at dbt Labs have communicated to me that they are going to be pinning to numpy < 2, so even if I remove the pandas pin, we can't expect numpy 2 to work. We need to verify that a newer Pandas works as well, as the reason we started pinning is that newer Pandas started breaking dbt-databricks. Keeping the ticket open to try upgrading Pandas again at some point.

alexeyegorov commented 2 weeks ago

@benc-db It's all fine. I fixed the version for myself and made sure it is known in the community. ;) tanks.

benc-db commented 2 weeks ago

For anyone else who sees this issue, newer versions of pandas also drop support for python 3.8, which we are not prepared to drop support for yet.