databricks / databricks-sql-python

Databricks SQL Connector for Python
Apache License 2.0
171 stars 96 forks source link

Bug in M2M authentication #423

Open jwhite3 opened 3 months ago

jwhite3 commented 3 months ago

(v3.3.0)

When attempting to follow the documentation for M2M authentication I get an error that after tons of digging seems to be the result of a bug.

Specifically, there's a bug in theExternalAuthProvider class (which is used when attempting to authenticate via M2M following the instructions. 

class ExternalAuthProvider(AuthProvider):
    def __init__(self, credentials_provider: CredentialsProvider) -> None:
        self._header_factory = credentials_provider()  # Returns a dict when instance of OAuthCredentialsProvider

    def add_headers(self, request_headers: Dict[str, str]):
        headers = self._header_factory()  # Attempts to call the dict resulting in TypeError
        for k, v in headers.items():
            request_headers[k] = v

If I had to guess, I would assume that either the OAuthCredentialsProvider should return a callable when called, or that assignment of the _header_factory attribute should not be a call to the instance. Then the headers dict would be correctly generated in add_headers

kravets-levko commented 3 months ago

Hi @jwhite3! I'm terribly sorry, somehow I missed this issue. You should adjust self._header_factory = credentials_provider line - remove function call. headers = self._header_factory() is correct. I will ask docs team to fix the example

kravets-levko commented 3 months ago

@jwhite3 Sorry, I misunderstood your issue. So, the example from the link you provided looks like this:

from databricks.sdk.core import Config, oauth_service_principal
from databricks import sql
import os

server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME")

def credential_provider():
  config = Config(
    host          = f"https://{server_hostname}",
    client_id     = os.getenv("DATABRICKS_CLIENT_ID"),
    client_secret = os.getenv("DATABRICKS_CLIENT_SECRET")
  )

  return oauth_service_principal(config)                    # <-- !!!

with sql.connect(server_hostname      = server_hostname,
                 http_path            = os.getenv("DATABRICKS_HTTP_PATH"),
                 credentials_provider = credential_provider) as connection:

While it seems that there's a bug in ExternalAuthProvider - in fact, everything is okay there. credentials_provider has to be a factory - function that returns a function that returns a dict. The bug actually is in the line I highlighted - the oauth_service_principal(config) should be wrapped with a function. So the example should be fixed like this (note the additional wrapper function get_headers):

from databricks.sdk.core import Config, oauth_service_principal
from databricks import sql
import os

server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME")

def credential_provider():
  config = Config(
    host          = f"https://{server_hostname}",
    client_id     = os.getenv("DATABRICKS_CLIENT_ID"),
    client_secret = os.getenv("DATABRICKS_CLIENT_SECRET")
  )

  def get_headers():
    return oauth_service_principal(config)               

  return get_headers

with sql.connect(server_hostname      = server_hostname,
                 http_path            = os.getenv("DATABRICKS_HTTP_PATH"),
                 credentials_provider = credential_provider) as connection:

Indeed, we still have to fix this in docs, but at least not the library itself 🙂

gdimitrovdev commented 1 day ago

@kravets-levko is this problem in the documentation still the case? When following the documentation example, I keep getting "Error during request to server", and with logging enabled I see that it is an invalid_client error. I am not sure if it may be related to that bug.

I am using the following versions:

databricks-sdk==0.38.0
databricks-sql-connector==3.5.0

To me it seems like I should not wrap the return value of credential_provider() in another function, as oauth_service_principal() returns Optional[CredentialsProvider], where:

CredentialsProvider = Callable[[], Dict[str, str]]

Do you have any idea if more recent versions were changed so that the old documentation is correct?