dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.91k stars 1.49k forks source link

Dagter k8s - external PostgreSQL credential setup #25538

Open ske94 opened 1 month ago

ske94 commented 1 month ago

What's the use case?

We came across an issue with Dagster on k8s. We're using a fork of the official helm chart which allows us to change the configuration that sets all of the postgres settings to be used from environment variables. This works like intended for the dagster pods as well as for the jobs as they copy the postgresql config from the dagster daemon as well as setting the envFrom to include the secret with the information from the operator. The issue that's breaking our setup is the fixed use of DAGSTER_PG_PASSWORD: https://github.com/dagster-io/dagster/blob/0fe3395605b83742003e6d18b4d70efd08da4720/python_modules/libraries/dagster-k8s/dagster_k8s/job.py#L40

In our case this env var should ideally be unset as the key in the secret from our operator differs. Since the fixed key is not present in our secret the pod cannot be scheduled. The variable is not used in our case.

This seems to force us into the need to always set this env variable although we’ve managed to overwrite the behavior in the background. If we set the env var like this everything is working fine:

envVars:
    - "DAGSTER_PG_PASSWORD=dummy"

Ideas of implementation

We suggest that the python module should not make assumptions about the deployment method and how it is configured. Instead the config should solely be controlled from the helm deployment. Ideally the config would be entirely controllable from the helm values. At the moment it is pretty much fixed in the template and allows only the password to be set via environment variables when it would be desirable to have more freedom.

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

Ardiea commented 1 week ago

We use the Vault Secrets Operator to generate database logins dynamically as needed. This api generates both a username AND a password while the helm chart, as currently implemented, only allows for a password to be loaded with an env-var via a secretRef.

As @ske94 suggests, all of the postgres configuration could be specified for all components via environment variables which would decouple the application from the deployment and provide some more freedom with regards to configuration choices and interfaces with other components.