LamaAni / KubernetesJobOperator

An airflow operator that executes a task in a kubernetes cluster, given a kubernetes yaml configuration or an image refrence.
58 stars 8 forks source link

FEATURE: templated specification file / override specific fields #23

Closed odesenfans closed 3 years ago

odesenfans commented 3 years ago

Hello,

I am currently trying to use this operator for my use cases. Thanks for the work, I really like the possibility of using jobs rather than pods. I use Helm on a regular basis and I wanted to propose a feature that would make this operator work nicely with Helm/Go-templated specification files.

Feature description

Templated specification file Allows to specify fields as Go templated values, a la Helm. The following example is actually exactly how I specify my config/secret volumes in Helm at the moment:

apiVersion: batch/v1
kind: Job
spec:
  template:
    spec:
      ...
      volumes:
        - name: config-volume
          configMap:
            name: {{ .Values.image.releaseName }}
        - name: secrets-volume
          secret:
            secretName: {{ .Values.image.releaseName }}-secrets

The values themselves could be passed to the constructor of the operator. For example:

job_task = KubernetesJobOperator(
    ...
    body_filepath="my_job.yaml",
    template_values={"image": {"releaseName": "my-first-release"}},

Advantages

Interoperability between Helm and Airflow: take your Helm job template file and move it to Airflow. No need for additional work/testing.

Alternatives I considered

  1. Implement this myself, using jinja: very doable, I'm just suggesting something generic and familiar for all Helm users (there are dozens of us!). It's probably what I'll end up doing in the meantime.
  2. Specify the body in Python directly: again, doable, but this feels exotic coming from Helm. This reduces the interoperability between the two environments and forces to use two formats while this is (kinda) standardised.
LamaAni commented 3 years ago

Hi, thank you for your comment. I was considering implementing exacly this for a while since it seems to be required, but have not had the time. Since I have not, I would certanly accept a PR where we can collaborate on the basis of this idea.

I was thinking something along the lines of input arguments as,

  1. enable_jijna_in_string_body (bool), default TRUE. Notice the string in the name so it will not be confused with dict input.
  2. jinja_context (dict), extended by task default options (like airflow jinja).

And methods:

  1. parse_jinja_body (only runs if enable_jijna_in_string_body is True).

To be added @ line 94 something like,

        body = body or self._read_body(
            resolve_relative_path(body_filepath or DEFAULT_EXECUTION_OBJECT_PATHS[DEFAULT_EXECTION_OBJECT], 2)
        )

        assert body is not None and (isinstance(body, (dict, str))), ValueError(
            "body must either be a yaml string or a dict"
        )

        if isinstance(body, str) and self.enable_jijna_in_string_body:
            body = self.parse_body_jinja(body)

Requirements:

  1. Support jinja2 only.
  2. Support airflow core jinja arguments.
  3. Support jinja_context (overrides core airflow jinja arguments if they exist, i.e. dict.update)

What do you think?

odesenfans commented 3 years ago

The best solution would be to use Go templates directly, but jinja2 should do the trick. I don't think there's a good implementation of Go templates in Python so far. I'll take a look tomorrow. :)

LamaAni commented 3 years ago

Hum, I would actially like to stick with Jinja since other operators are integrated with it. Added support see PR: https://github.com/LamaAni/KubernetesJobOperator/pull/27

LamaAni commented 3 years ago

Can this be closed?

odesenfans commented 3 years ago

Yes! Closing it then giving it a try right away.