apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.9k stars 4.63k forks source link

[Improvement] Kubernetes - execute complex jobs with sidecar containers etc etc #16685

Open giovannidalloglio opened 1 month ago

giovannidalloglio commented 1 month ago

Search before asking

Description

Hello.

Do you fully support the kubernetes Job api?

In the documentation (https://dolphinscheduler.apache.org/en-us/docs/3.2.2/guide/task/kubernetes) i just see the option to specify a single container.

But various "enterprise" scenarios require the full set of options (https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/).

As example, in our environment we need a "worker" container, and two differente sidecar containers (one of them is the "tunnel" to the DB, etc...)

Is it possibile to run those kind of jobs?

Are you willing to submit a PR?

Code of Conduct

SbloodyS commented 1 month ago

cc @Gallardot

Gallardot commented 1 month ago

@giovannidalloglio

Currently, not all capabilities and features of K8S Job are fully supported. At present, only the definition of a single container in spec.template.spec.containers is supported; multiple container definitions are not supported. Perhaps in the future, we can allow users to customize the spec.template of the Job, but this is not supported at the moment.

In my personal understanding, Sidecars belong to the general capabilities of the infrastructure and should not be customized by users. Administrator can use tools like gatekeeper to inject more custom features.

cc @fuchanghai @Mighten @SbloodyS @ruanwenjun

SbloodyS commented 1 month ago

@giovannidalloglio

Currently, not all capabilities and features of K8S Job are fully supported. At present, only the definition of a single container in spec.template.spec.containers is supported; multiple container definitions are not supported. Perhaps in the future, we can allow users to customize the spec.template of the Job, but this is not supported at the moment.

In my personal understanding, Sidecars belong to the general capabilities of the infrastructure and should not be customized by users. Administrator can use tools like gatekeeper to inject more custom features.

cc @fuchanghai @Mighten @SbloodyS @ruanwenjun

+1

giovannidalloglio commented 3 weeks ago

@SbloodyS

In my personal understanding, Sidecars belong to the general capabilities of the infrastructure and should not be customized by users. Administrator can use tools like gatekeeper to inject more custom features.

you are generally rigth... but there are complex situation (like a bank) where hundreds of different applications will have to share the same cluster, and those cases are not manageable with an aproach "one size fits all"... they give some guidelines and let developers sort out.

In my case, we use a DB (provided by the cloud vendor) that is connectable only via sidecar container (that work as a proxy), while another sidecar container keeps some general settings. My sidecar containers are "always on", but since the kubernetes job ends when all "main" containers are done... we have a case of "main + sidecar".

At the moment, we are evaluating dolphin; if we'll decide to continue, I'll probably implement a variation of the current kubernetes task (maybe in the GUI we can add a selector "adavanced mode" and a wide textbox, to let the user write his own YAML file... something similar to what you did in the "cluster configuration" part). Of course, if you guide me enough, the outcome can be something solid and good enough to be pulled in your main repository...

Gallardot commented 3 weeks ago

you are generally rigth... but there are complex situation (like a bank) where hundreds of different applications will have to share the same cluster, and those cases are not manageable with an aproach "one size fits all"... they give some guidelines and let developers sort out.

In my case, we use a DB (provided by the cloud vendor) that is connectable only via sidecar container (that work as a proxy), while another sidecar container keeps some general settings. My sidecar containers are "always on", but since the kubernetes job ends when all "main" containers are done... we have a case of "main + sidecar".

At the moment, we are evaluating dolphin; if we'll decide to continue, I'll probably implement a variation of the current kubernetes task (maybe in the GUI we can add a selector "adavanced mode" and a wide textbox, to let the user write his own YAML file... something similar to what you did in the "cluster configuration" part). Of course, if you guide me enough, the outcome can be something solid and good enough to be pulled in your main repository...

@giovannidalloglio First, I believe that this scenario makes sense and is a beneficial feature for the K8S task. However, allowing users to write the entire YAML file of a Job can introduce issues such as security risks and excessive flexibility, which can require a lot of work.

Therefore, I propose to provide an advanced mode that allows users to write only the initContainer sections. This is a balanced approach. This change does not greatly affect security. It's just a small number of changes.

However, I still have a question: does storage need to be shared between containers, i.e., do we need to mount volumes?

cc: @SbloodyS @EricGao888 @Radeity @ruanwenjun

giovannidalloglio commented 3 weeks ago

@Gallardot You have a point while you speak about "introducing security risks and excessive flexibility". I'm 100% with you on this.

I proposed to "write your own full YAMLs" because there are also other considerations (that I not mentioned before). Eg. we use use annotations and labels to charge the costs accordingly (we run in a shared cluster with hundred of applications), and those things are in the head of the YAML. we also need volumes: we install some configuration files, and mount them as configMap volumes. And, to be future-proof, maximum flexibility seemed to be the best approach...

But i understand your concerns... More thinking is needed....