getindata / kedro-airflow-k8s

Kedro Plugin to support running pipelines on Kubernetes using Airflow.
https://kedro-airflow-k8s.readthedocs.io
Apache License 2.0
29 stars 11 forks source link

Support k8s env_vars + new MLFlow auth handler #102

Closed mjedrasz closed 3 years ago

mjedrasz commented 3 years ago

We use an external Airflow secrets backend to store sensitive data, e.g. credentials. These credentials are accessible from Dags, so instead of setting up environment variables somewhere else I propose enabling passing Airflow variables as environment variables.

This PR adds the possibility to pass Airflow variables (Variable.get) as environment variables to pods. Also, AuthHandler for MLflow tracking server API has been refactored to accommodate other authentication types, e.g. Basic Authentication, and a new VarsAuthHandler has been added which receives credentials from the Airflow variables.

Logging pod creation request has been changed to debug to not leak sensitive data.


Keep in mind:

michalzelechowski-getindata commented 3 years ago

Just small change required, you need to run pre-commit hook.

mjedrasz commented 3 years ago

linting done

mjedrasz commented 3 years ago

anything else needed?

michalzelechowski-getindata commented 3 years ago

@mjedrasz One of the tests is failing, could you have a look please?

mjedrasz commented 3 years ago

yes, linting broke some tests. fixed. btw, I'm not able to run all tests locally - I get sqlite3.OperationalError: no such table: xcom, for instance, when running test_start_mlflow_experiment_operator.py.

michalzelechowski-getindata commented 3 years ago

You may need to run: airflow db reset --yes before running the tests for the first time. Some operator tests are writing to db directly, so this have to be in place. Not the best solution for unit tests, but this allows us to check if operators are fine.

mjedrasz commented 3 years ago

if only airflow db reset --yes worked for me, also getting errors. anyway, tests passed.

michalzelechowski-getindata commented 3 years ago

if only airflow db reset --yes worked for me, also getting errors. anyway, tests passed.

I can't tell what's the problem then, sorry. This is the way we setup things on our end, also for github actions. You can have a look at .github/workflows/python-package.yml, maybe there's something additional that is missing in your environment.