Define a Source/Sink API

DataWorkz-NL / KubeETL

ETL controller for Kubernetes

Apache License 2.0

4 stars 0 forks source link

KubeETL should make it easy for Data Engineers/Data Scientist to create ETL pipelines. This requires connection configuration. Often as ETL projects scale, source/sink configuration can become a mess.

By providing an API Kind for Sources/Sinks (or Connectors?) we can add the following to the project:

Inject authentication information into pipelines
Document the available sources/sinks in the project

Eventually we can also add more complex functionality, such as regularly scheduled Data Quality checks on sources.

A basic Source/Sink should at least contain the following information:

URL to the source/sink.
Authentication information (let's start with a simple Username/Password combo via Kubernetes Secrets for now).
Standard Kubernetes Metadata.

For now there is no need for a controller, although that could change in the future. We just use the API object as a way to store information.

DataWorkz-NL / KubeETL

Define a Source/Sink API #1