DataWorkz-NL / KubeETL

ETL controller for Kubernetes
Apache License 2.0
4 stars 0 forks source link

Improve Dataset/Connection injection API #50

Open ThijsKoot opened 3 years ago

ThijsKoot commented 3 years ago

The current method of injecting connections/datasets is kind of clunky. There are other examples out there of injecting (templated) values into pods that we can borrow from.

Banzai Vault:

env:
- name: AWS_SECRET_ACCESS_KEY
  value: "vault:secret/data/accounts/aws#AWS_SECRET_ACCESS_KEY"

akv2k8s:

- name: TEST_SECRET
  value: "secret-inject@azurekeyvault" #ref to akvs

Proposal with datasets:

apiVersion: etl.dataworkz.nl/v1alpha1
kind: Workflow
metadata:
  name: example
spec:
  templates:
    - name: ingest-data
      source:
        - dataset: source-api
          alias: source # optional alias for templating injection values
      sink:
        - dataset: mysql-table
          alias: sink
      container:
        image: apireader:latest
        env:
          - name: MYSQL_CONNECTION_STRING
            value: "sink.connection:{{.user}}:{{.password}}@{{.host}}/{{.database}}"
          - name: "API_URL"
            value: "source.connection:{{.url}}"
          - name: API_KEY
            value: "source.connection:{{.apiKey}}"
        args:
          - --table
          - "sink.metadata:{{.table}}"
          - --endpoint
          - "source.metadata:{{.endpoint}}"

Proposal with connections:

apiVersion: etl.dataworkz.nl/v1alpha1
kind: Workflow
metadata:
  name: example-connection
spec:
  templates:
    - name: transform-data
      connections:
        - name: dwh
      container:
        image: data-transformer:latest
        env:
          - name: DWH_CONNECTION_STRING
            value: "dwh:{{.user}}:{{.password}}@{{.host}}/{{.database}}"
        args:
          - "--source-table"
          - "sales"
          - "--destination-table"
          - "sales_transformed"