PrefectHQ / prefect-dbt

Collection of Prefect integrations for working with dbt with your Prefect flows.
https://prefecthq.github.io/prefect-dbt/
Apache License 2.0
82 stars 9 forks source link

Manage Compute engine credentials in addition of oauth2 #92

Closed lucienfregosibodyguard closed 1 year ago

lucienfregosibodyguard commented 1 year ago

Hi Prefect-dbt team,

Following this tread https://github.com/PrefectHQ/prefect-dbt/issues/56 I tried to authenticate within a kubernetes pod associated to a valid service account. google.auth.default() returns a compute engine credentials https://google-auth.readthedocs.io/en/master/reference/google.auth.compute_engine.credentials.html

Then the code fails because of 'Credentials' object has no attribute 'refresh_token' From what I understood the code expects a https://google-auth.readthedocs.io/en/stable/reference/google.oauth2.credentials.html

Would be super nice to be able to use the compute_engine credentials

ahuang11 commented 1 year ago

Hi @lucienfregosibodyguard, thanks for reporting this. Do you have a traceback?

Also, would you be interested in contributing a fix; might be here: https://github.com/PrefectHQ/prefect-gcp/blob/main/prefect_gcp/credentials.py#L127-L144

lucienfregosibodyguard commented 1 year ago

Yes here are the traceback

Encountered exception during execution:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/prefect/engine.py", line 1215, in orchestrate_task_run
    result = await task.fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/prefect_dbt/cli/commands.py", line 136, in trigger_dbt_cli_command
    profile = dbt_cli_profile.get_profile()
  File "/usr/local/lib/python3.9/site-packages/prefect_dbt/cli/credentials.py", line 111, in get_profile
    "outputs": {self.target: self.target_configs.get_configs()},
  File "/usr/local/lib/python3.9/site-packages/prefect_dbt/cli/configs/bigquery.py", line 114, in get_configs
    configs_json[key] = getattr(google_credentials, key)
AttributeError: 'Credentials' object has no attribute 'refresh_token'

We could use something like this https://programtalk.com/python-more-examples/google.auth.compute_engine.credentials.Credentials/

Will try to make a PR

lucienfregosibodyguard commented 1 year ago

@ahuang11 I tried to use the code snippet above but it doesn't work

lucienfregosibodyguard commented 1 year ago

@ahuang11 do you have time to help on this ? Still stuck on my side :/

ahuang11 commented 1 year ago

@lucienfregosibodyguard can you try pulling this branch to test? https://github.com/PrefectHQ/prefect-dbt/pull/98

lucienfregosibodyguard commented 1 year ago

Hi @ahuang11 I looked at the PR but it seems that token is not an attribute of the class https://google-auth.readthedocs.io/en/master/reference/google.auth.compute_engine.credentials.html

maybe I miss something but I can't see how it can works 😕

ahuang11 commented 1 year ago

There might be multiple credentials: I'm looking at Google auth default. https://google-auth.readthedocs.io/en/master/reference/google.auth.html

 [docs]def default(scopes=None, request=None, quota_project_id=None, default_scopes=None):
    """Gets the default credentials for the current environment.

    `Application Default Credentials`_ provides an easy way to obtain
    credentials to call Google APIs for server-to-server or local applications.
    This function acquires credentials from the environment in the following
    order:

    1. If the environment variable ``GOOGLE_APPLICATION_CREDENTIALS`` is set
       to the path of a valid service account JSON private key file, then it is
       loaded and returned. The project ID returned is the project ID defined
       in the service account file if available (some older files do not
       contain project ID information).

       If the environment variable is set to the path of a valid external
       account JSON configuration file (workload identity federation), then the
       configuration file is used to determine and retrieve the external
       credentials from the current environment (AWS, Azure, etc).
       These will then be exchanged for Google access tokens via the Google STS
       endpoint.
       The project ID returned in this case is the one corresponding to the
       underlying workload identity pool resource if determinable.
    2. If the `Google Cloud SDK`_ is installed and has application default
       credentials set they are loaded and returned.

       To enable application default credentials with the Cloud SDK run::

            gcloud auth application-default login

       If the Cloud SDK has an active project, the project ID is returned. The
       active project can be set using::

            gcloud config set project

    3. If the application is running in the `App Engine standard environment`_
       (first generation) then the credentials and project ID from the
       `App Identity Service`_ are used.
    4. If the application is running in `Compute Engine`_ or `Cloud Run`_ or
       the `App Engine flexible environment`_ or the `App Engine standard
       environment`_ (second generation) then the credentials and project ID
       are obtained from the `Metadata Service`_.
    5. If no credentials are found,
       :class:`~google.auth.exceptions.DefaultCredentialsError` will be raised.

    .. _Application Default Credentials: https://developers.google.com\
            /identity/protocols/application-default-credentials
    .. _Google Cloud SDK: https://cloud.google.com/sdk
    .. _App Engine standard environment: https://cloud.google.com/appengine
    .. _App Identity Service: https://cloud.google.com/appengine/docs/python\
            /appidentity/
    .. _Compute Engine: https://cloud.google.com/compute
    .. _App Engine flexible environment: https://cloud.google.com\
            /appengine/flexible
    .. _Metadata Service: https://cloud.google.com/compute/docs\
            /storing-retrieving-metadata
    .. _Cloud Run: https://cloud.google.com/run
ahuang11 commented 1 year ago

"google.oauth2.service_account.Credentials" seems to have the token attr. image

lucienfregosibodyguard commented 1 year ago

Yes you're right it has a token attribute indeed !

Few remarks :

Then I add dbname: <project> but I got a new error Got duplicate keys: (project) all map to "database"

Hope that helps

lucienfregosibodyguard commented 1 year ago

Oh following this

We need method: oauth-secrets and it works !!!

lucienfregosibodyguard commented 1 year ago

I did a PR https://github.com/PrefectHQ/prefect-dbt/pull/100

ahuang11 commented 1 year ago

This should now be fixed in v0.2.6. Please reopen if that's not the case!

lucienfregosibodyguard commented 1 year ago

Hi @ahuang11 sadly it's still doesn't work, got a new error

Encountered exception during execution:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1339, in orchestrate_task_run
    result = await task.fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect_dbt/cli/commands.py", line 139, in trigger_dbt_cli_command
    yaml.dump(profile, f, default_flow_style=False)
  File "/usr/local/lib/python3.10/site-packages/yaml/__init__.py", line 253, in dump
    return dump_all([data], stream, Dumper=Dumper, **kwds)
  File "/usr/local/lib/python3.10/site-packages/yaml/__init__.py", line 241, in dump_all
    dumper.represent(data)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 27, in represent
    node = self.represent_data(data)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 48, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 207, in represent_dict
    return self.represent_mapping('tag:yaml.org,2002:map', data)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 48, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 207, in represent_dict
    return self.represent_mapping('tag:yaml.org,2002:map', data)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 48, in represent_data
    node = self.yaml_representers[data_types[0]](self, data)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 207, in represent_dict
    return self.represent_mapping('tag:yaml.org,2002:map', data)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 118, in represent_mapping
    node_value = self.represent_data(item_value)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 52, in represent_data
    node = self.yaml_multi_representers[data_type](self, data)
  File "/usr/local/lib/python3.10/site-packages/yaml/representer.py", line 317, in represent_object
    reduce = data.__reduce_ex__(2)
TypeError: cannot pickle 'coroutine' object

I guess it's related to the sync definition but don't know how to fix it