application-research / outercore-eng-kb

Official Knowledge base repo of Estuary
https://estuary.tech
5 stars 0 forks source link

Idea/Proposal: Tekton Data Pipeline Framework to Onboard Data #27

Open alvin-reyes opened 1 year ago

alvin-reyes commented 1 year ago

Overview

Once we have K8s installed on our EHI, we need to start looking into Data Onboarding Tools.

I propose the use of Tekton Data Pipeline (https://github.com/tektoncd/pipeline). This is essentially a task framework that leverages k8s service infrastructure to create ephemeral task runners in the form of pods/containers.

How it'll work.

image

*Queue is optional.

10d9e commented 1 year ago

I love that this is k8s native - it should give us the capability to turn the appropriate dials to deal with scale. cc: @Zorlin

alvin-reyes commented 1 year ago

Adding here: I've been researching on the use of pyspark as the data processing task for this.