getindata / kedro-airflow-k8s

Kedro Plugin to support running pipelines on Kubernetes using Airflow.
https://kedro-airflow-k8s.readthedocs.io
Apache License 2.0
29 stars 11 forks source link

Move some of the command line options to a config file and add dynamic config loader support #24

Closed em-pe closed 3 years ago

em-pe commented 3 years ago

There are some configuration options required by commands that are tied to the environment. To make the work of the plugin easier, an allow storing some of the configuration in VCS we could move them to config file (airflow-k8s.yml).

To make the configuration more robust and dynamic we could reuse TemplatedConfigLoader hook. Check how we do it in kedro-kubeflow-example for reference.

References from kedro-kubeflow:

Before implementing we may have a short discussion which options do we move to config and which of them we allow to override with console call.

michalzelechowski-getindata commented 3 years ago

We have the following options:

Environment is a group option so it stays. Image is already in the config. Output seems to be good candidate as it's tied to the environment in the first place. Cron-expression is specific but it seems to be tied to the project. dag_name is by default derived from package_name - what do you think @em-pe ? Also we have experiment_name (that's internal but related to the dag_name) that's derived from package_name - should we put it into config as well? wait_for_completion is a runtime option.

em-pe commented 3 years ago

Here are my suggestions: experiment_name -> put in config, default to package name image -> put in config, allow override through optional console param output -> put in config, allow to override through console param cron expression -> put in config, default to @daily dag_name -> put in config, default to package name, allow to override through console param wait for completion -> put in config, allow override through console param

Regarding defaults, have a look at templated config loader - we could add env variable that we populate with package_name, and instead of custom default handling we just initialise project with config file that explicitly configures all the defaults.

experiment_name: {package_name}
michalzelechowski-getindata commented 3 years ago

cron expression -> put in config, default to @daily

@em-pe We should use it only for schedule, or also for upload-pipeline? Assuming run-once should ignore this config parameter.

michalzelechowski-getindata commented 3 years ago

I am a bit reluctant to put wait_for_completion to config, as it would be hard to force case when you don't want plugin to wait (since parameter takes int describing timeout rather than bool flag).

em-pe commented 3 years ago

@michalzelechowski-getindata let's use it for both, we may then get rid of the schedule command I believe. Regarding wait for completion, let's leave it as CLI option now. In kedro-kubeflow the behavior is slightly different as there is an option to submit pipeline for execution and be done without waiting for it to complete.

michalzelechowski-getindata commented 3 years ago

@em-pe I think it makes sense for the schedule to allow for override from CLI, in contrary to upload-pipeline so I suggest we keep it separate for better flexibility and alignment with kedro-kubeflow.

em-pe commented 3 years ago

@michalzelechowski-getindata sounds good to me