databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
226 stars 119 forks source link

cannot import name 'HeaderFactory' from 'databricks.sdk.core' #719

Open martin-mm1 opened 4 months ago

martin-mm1 commented 4 months ago

Describe the bug

Since the release of the new 0.29.0 version of databricks-sdk, the dbt job running on our databricks cluster, using dbt-databricks==1.6.5 version, started to fail with the error provided below. After downgrading to version 0.28.0, by explicitly specifying it on the databricks cluster, the error is no longer observed.

Steps To Reproduce

Install dbt-databricks==1.6.5 on a databricks cluster and run your dbt models.

Expected behavior

The expected behavior would be to successfully import HeaderFactory.

Screenshots and log output

cannot import name 'HeaderFactory' from 'databricks.sdk.core' (/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/core.py) 07:09:00 Traceback (most recent call last): File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/cli/requires.py", line 87, in wrapper result, success = func(*args, *kwargs) File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/cli/requires.py", line 72, in wrapper return func(args, **kwargs) File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/cli/requires.py", line 140, in wrapper profile = load_profile(flags.PROJECT_DIR, flags.VARS, flags.PROFILE, flags.TARGET, threads) File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/config/runtime.py", line 70, in load_profile profile = Profile.render( File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/config/profile.py", line 436, in render return cls.from_raw_profiles( File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/config/profile.py", line 401, in from_raw_profiles return cls.from_raw_profile_info( File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/config/profile.py", line 355, in from_raw_profile_info credentials: Credentials = cls._credentials_from_profile( File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/config/profile.py", line 165, in _credentials_from_profile cls = load_plugin(typename) File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/adapters/factory.py", line 212, in load_plugin return FACTORY.load_plugin(name) File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/adapters/factory.py", line 58, in load_plugin mod: Any = import_module("." + name, "dbt.adapters") File "/usr/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1030, in _gcd_import File "", line 1007, in _find_and_load File "", line 986, in _find_and_load_unlocked File "", line 680, in _load_unlocked File "", line 855, in exec_module File "", line 228, in _call_with_frames_removed File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/adapters/databricks/init.py", line 1, in from dbt.adapters.databricks.connections import DatabricksConnectionManager # noqa File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/adapters/databricks/connections.py", line 60, in from dbt.adapters.databricks.auth import token_auth, m2m_auth File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/adapters/databricks/auth.py", line 3, in from databricks.sdk.core import CredentialsProvider, HeaderFactory, Config, credentials_provider ImportError: cannot import name 'HeaderFactory' from 'databricks.sdk.core' (/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/core.py)

System information

dbt-databricks==1.6.5 dbt-core==1.6.7 dbt-spark==1.6.0 python==3.9.5

Additional context

Seems like there should be a limit to what version of the databricks-sdk is installed alongside the dbt-databricks adapter for version 1.6.5 instead of pulling the most recent one.

BramVandendriessche commented 4 months ago

I'm having the same issue for version 1.7.5.

saxyogi commented 4 months ago

Its happening at my end as well. I mitigated it using prev version.

case-k-git commented 4 months ago

This issue is happening in my side. when execute dbt source freshness command. library dependency issue. Downgrade the databricks-sdk version(from 0.29.0 to 0.28.0) solve the issue. . May be this lib issue better to handle inside dbt-databricks.

databricks-sdk==0.28.0

Seems databricks-sdk released yesterday https://pypi.org/project/databricks-sdk/

benc-db commented 4 months ago

This bug only exists in old versions, as newer versions pin the SDK to a particular known-good version. What are the reasons you are pinned to old versions? I cannot backport a fix to a particular patch version, so it's more useful for me to find out why you're not upgrading than to file a bug that only exists in outdated patch versions.

case-k-git commented 4 months ago

Thank you for your response! I see, it doesn't occur in the latest version.

This is my personal opinion. Hope some aspect of this will help.

We have fixed the version of the dbt-databricks because of the processing stops working when the version is upgraded. Do you think it will be better to fix the version of both dbt-databricks and databricks-sdk as well by user side? If not handle library dependency between dbt-databricks and databricks-sdk , It my be possible another library dependency issue could be happen in the future. So probably better to handle library dependency by user side.

The reason I personally cannot easily upgrade the version of dbt-databricks is because there are some change need not only version but also dbt config setting, and it won't work if we simply upgrade the version.

Changes: Need to change existing dbt related files. https://docs.getdbt.com/docs/dbt-versions/core-upgrade

Additionally, unlike the other python library, dbt-databricks version up may change the out put data results. I think using some tools like https://github.com/dbt-labs/dbt-audit-helper might help in this regard, but it would also be necessary to have mechanisms like automation in place to use them.

So It is not so easy to update the dbt version rather than other python library. Hope some aspect of this info will help.

Thank you!

benc-db commented 4 months ago

@case-k-git when you say the processing stops working, I need to know specifically how so that I can fix it. We will never be able to go back to old patch version number and change its code, so if you can't upgrade, you will never get fixes. The upgrade to 1.8 should not be a breaking change; if it is, I should know what broke so that I can restore it.

BramVandendriessche commented 4 months ago

What are the reasons you are pinned to old versions?

In my case because I'm working in a heavily regulated environment where stability is key. Upgrading package versions requires paperwork for the change. I'll plan an upgrade in one of our future releases. I've pinned the databricks-sdk version to mitigate the issue - that was the path of least resistance :)

NodeJSmith commented 4 months ago

@benc-db I think something like the below would be all that is needed to fix this, it works on a local installation of mine at least.

The real breaking change was the sdk renaming HeaderFactory to CredentialsProvider and credentials_provider to credentials_strategy. Everything else works fine once you fix the imports.

try:
    from databricks.sdk.core import credentials_provider
    from databricks.sdk.core import CredentialsProvider
    from databricks.sdk.core import HeaderFactory
except ImportError:
    from databricks.sdk.core import credentials_strategy as credentials_provider
    from databricks.sdk.core import CredentialsProvider
    from databricks.sdk.core import CredentialsProvider as HeaderFactory

Here's the are the relevant python packages/versions I have installed, where this small change is working/allowing everything to run smoothly.

databricks-connect==14.3.2
databricks-sdk==0.29.0
databricks-sql-connector==3.1.2
dbt-adapters==1.3.2
dbt-common==1.6.0
dbt-core==1.8.4
dbt-databricks==1.8.3
dbt-extractor==0.5.1
dbt-semantic-interfaces==0.5.1
dbt-spark==1.8.0
case-k-git commented 3 months ago

@benc-db
Thank you for your reply. Yea sure I confirm that existing dbt operation is failed when update the latest one which old version work but not checked what I need to change from old dbt config for latest one. So I will repot after checked.