Open tatiana opened 11 months ago
π€
Hello @tatiana,
It's great to see you back! I hope you've been doing well. Thank you for this detailed issue report with your thoughts and suggestions. Let's dive into it.
Based on the context provided, it seems that Cosmos should indeed support creating the profiles.yml
for the end-user when using ExecutionMode.KUBERNETES
. This is because the profiles.yml
file contains sensitive information such as the host, http_path, and token, which are required for connecting to the database.
The best way to expose the file and the sensitive information is by creating Kubernetes Secrets. Kubernetes Secrets are designed to store and manage sensitive information, such as passwords, OAuth tokens, and ssh keys. Storing confidential information in a Secret is safer and more flexible than putting it verbatim in a Pod definition or in a container image.
Here is an example of how you can create a Kubernetes Secret for the profiles.yml
file:
kubectl create secret generic dbt-profiles --from-file=profiles.yml=./profiles.yml
And then you can mount the secret as a volume in your Pod:
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: myimage
volumeMounts:
- name: dbt-profiles-volume
mountPath: "/path/to/profiles.yml"
volumes:
- name: dbt-profiles-volume
secret:
secretName: dbt-profiles
This way, the profiles.yml
file will be available in your Pod, and the sensitive information will be securely stored in the Kubernetes Secret.
As for how the profiles.yml
file is currently being generated and managed in the Cosmos codebase for ExecutionMode.LOCAL
and ExecutionMode.VIRTUALENV
, it is managed in the ProfileConfig
class in the cosmos/config.py
file.
For ExecutionMode.LOCAL
and ExecutionMode.VIRTUALENV
, the profiles.yml
file can be generated in two ways:
Using a user-supplied profiles.yml
file: If the profiles_yml_filepath
attribute is set, the ProfileConfig
class uses this file directly. This is validated in the validate_profiles_yml
method, which checks if the file exists at the given path.
Using Cosmos to map Airflow connections to dbt profiles: If the profile_mapping
attribute is set to a subclass of BaseProfileMapping
, Cosmos generates the profiles.yml
file. This is done in the ensure_profile
method, which creates a temporary profiles.yml
file with the contents returned by the get_profile_file_contents
method of the BaseProfileMapping
subclass.
In both cases, the ProfileConfig
class validates that either profiles_yml_filepath
or profile_mapping
is set, but not both, as they are mutually exclusive. This is done in the validate_profile
method.
You can find more details in the cosmos/config.py file.
I hope this helps! If you have any other questions, feel free to ask.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Context
One of Cosmos' popular features is to allow users to define how to connect to a database as Airflow connections, and it can generate the dbt profile - so users do not need to manage sensitive information in two places. This is accomplished via profile mapping classes, as described in: https://astronomer.github.io/astronomer-cosmos/profiles/index.html#using-a-profile-mapping
Unfortunately, this feature works for
ExecutionMode.LOCAL
andExecutionMode.VIRTUALENV
, but not forExecutionMode.DOCKER
andExecutionMode.KUBERNETES
. This was a limitation that was discussed when these execution modes were introduced, and the workaround is for the end-users to manage this themselves by having a dbtprofiles.yml
file baked into the container image and setting sensitive information in the way they prefer (such as via Kubernetes secrets).Since
ExecutionMode.KUBERNETES
is more popular thanExecutionMode.DOCKER
, this ticket aims to discuss and review if and how we could improve this. There are two key questions:i) Should Cosmos support creating the
profiles.yml
for the end-user when usingExecutionMode.KUBERNETES
ii) How would Cosmos expose the file itself and the sensitive information in case we decide to do (i)Some possibilities
i) Creating/exposing
profiles.yml
When using Cosmos Local operators, we already create this file when users use a profile mapping. We could do the same for K8s.
The difference would be that we'd need to expose the created file to K8s. A way to do this from Airflow is to use volumes, as illustrated in:
Are we happy for Cosmos to set up this volume in users of K8s Pods? How much control should users have to configure the volumes used for this purpose?
ii) Exposing sensitive information
(a) Kubernetes allows users to set environment variables during the Pod creation, and this could be set via Airflow:
https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
However, this approach exposes sensitive information in the
PodDescription
, which can raise security implications.(b) A more secure approach is usually to create Kubernetes Secrets and make those available to pods. This is illustrated in:
This would mean Cosmos would be creating/potentially overriding other Kubernetes Secrets managed by the end-user. Would users be happy with this approach? Should Cosmos delete the secrets afterwards?
We welcome your thoughts!