DataWorkz-NL / KubeETL

ETL controller for Kubernetes
Apache License 2.0
4 stars 0 forks source link

Feat/design proposal #3

Closed Blokje5 closed 3 years ago

Blokje5 commented 3 years ago

Initial proposal for the design of the KubeETL API. The resources defined here should provide a basis to create reusable ETL workflows.

Blokje5 commented 3 years ago

I made some small updates. Let's align in the call today on the next steps.

ThijsKoot commented 3 years ago

Expanded on Connections, specifically how credentials are specified and what options we have to utilize the protocol-information. I.e. schema validation and connection testing.

Didn't include anything on container lifecycle (whole build/CI process) from my old PR as I think we should keep it out of scope for now. Still really like the idea of this aspect but it's a whole can of (really tasty?) worms that we should open when the time is right. I really didn't think this analogy through, cans of worms aren't desirable.

Haven't had a chance to go through tasks and workflows yet. I'm currently rolling out Argo which serves as a good frame of reference and inspiration.

Blokje5 commented 3 years ago

@ThijsKoot Cans of worms are desirable if you go fishing 😂 .

I liked the changes you made, expanding on the Connection. Orchestration is indeed also (a rather obvious) goal and should be documented. I am not sure why I didn't think of that.

As far as I am concerned we now have a basis to work with. @arnobroekhof If you want you can have a look at it as well, then we can start thinking about the initial implementation stages.

ThijsKoot commented 3 years ago

I've been thinking a bit about credential storage. Obviously credentials should always be stored as secrets meaning that credential values are always Secret-refs. Inline specification of credentials results in a secret being created. ConfigMap-references are currently included in the specification. Here's my question: do we implement a Sensitive-property to determine whether a field is stored in a ConfigMap or Secret, or are secrets always the right choice? Sane default here would be Secret obviously.

Blokje5 commented 3 years ago

@ThijsKoot for now I will use a cop-out: An administrator should determine whether a field is sensitive or not (for now). We want connections to be flexible for now, although we could expand the spec in the future. There are valid reasons to keep connections flexible: e.g. let's say we want to support a credential inject mechanisms to stores a vault URL that dynamically injects the credentials.

But there is an argument to be made to support the sensitive field to support the "secret-only" approach for now. There are many Kubernetes clusters where the only approach to secrets is Kubernetes secrets.

ThijsKoot commented 3 years ago

@Blokje5 Secret-only for now seems good. Iterative approach and all that