DataWorkz-NL / KubeETL

ETL controller for Kubernetes
Apache License 2.0
4 stars 0 forks source link

KubeETL

KubeETL is a Kubernetes based framework for managing datasets and creating data-driven pipelines that interact with those datasets. It aims to simplify tasks that commonly arise when managing a large number of datasets, such as:

Often these are the types of tasks that are pushed to the backlog in favor of connecting more data sources and providing more reporting to downstream consumers of data. However, in our experience we also know that if these tasks are not prioritised, eventually you will experience issues in the reliability of your workflows. Unreliable workflows lead to unreliable data, and that will affect the trust your end users have in your data.

KubeETL is available for your usage under the Apache 2.0 License.

Installation

KubeETL provide quick-start files in the manifests/ folder. If you want to further customize your configuration we recommend creating your own kustomize overlay.

For the default installation, execute the following commands:

kubectl create namespace kubeetl
kubectl apply -n kubeetl -f https://raw.githubusercontent.com/DataWorkz-NL/KubeETL/manifests/quick-start.yaml

As KubeETL (currently) relies on Argo Workflows to execute Workflows you have to install Argo Workflows as well. See the Argo Workflows quick start guide for more details

Examples

See the examples directory for examples of how to use KubeETL. Each example provides a README that explains how to follow along.

Concepts

KubeETL provides adds a set of Custom Resources (CRs) to your Kubernetes cluster to define and interact with datasets. These are:

Based on these resources KubeETL can simplify your interaction with the data by:

KubeETL leverages Kubernetes to provide all these mechanism. At it's core, KubeETL is a Kubernetes Operator, that can interact with Workflows running on Kubernetes.

Features

Currently KubeETL provides the following features:

Roadmap

Currently we have the following main priorities:

If you want to contribute to the evolution of KubeETL, see the next section.

Contributing

We gladly accept contributions to the project. We accept any kinds of improvements:

We would also love to hear where you would like to see the project evolve too. Feel free to open an issue on Github to share your ideas.

Make sure to check out our contributing guide before making a contribution to the project.