[DSIP-63][k8s] Support User-customized K8s YAML Task

Mighten commented 3 weeks ago

Action List: Extension of operations for the k8s YAML task:

[ ] Pod #16482

Search before asking

[X] I had searched in the DSIP and found no similar DSIP.

Motivation

Supporting user-customized K8s YAML tasks has the following benefits:

Flexibility: Unlike the existing K8s low-code job with limited functionality, YAML tasks provide users with the flexibility to define sophisticated task instances in DolphinScheduler, similar to how custom JSON does in DataX.
Workflow Customization: Users can integrate operational and maintenance processes into DolphinScheduler using YAML for complex workflows.
Configuration Requirements: The current K8s low-code job does not meet users' in-depth needs, particularly for tasks involving multiple pods or specific configurations like environment variables and tolerations; in contrast, K8s YAML tasks do.

In short, by enabling user-customized YAML tasks, DolphinScheduler can better support a wide range of Kubernetes-based workflows and operational requirements.

Design Detail

2.1 Design Overview

The following is a Swimlane Diagram showing how this k8s YAML task is embedded into Apache DolphinScheduler:

2-1-1-design-overview Figure 2-1(1). Design Overview

User starts a Web page to edit and save K8s YAML Workflow.
UI provides an editor for user to input YAML in Custom Template mode.
API Server encapsulates command and hands it over to Master.
Master splits the workflow DAG and dispatches tasks to Worker.
Worker picks the appropriate task executor and operation. E.g., for k8s Pod YAML, Worker picks YAML Task Executor, and then picks Pod Operation.
Worker reports status to Master.
User reviews k8s YAML task log in the Task Instance Window.

2.2 Frontend Design

The frontend adds support for user-customized k8s YAML tasks while remaining compatible with the original k8s low-code jobs.

2-2-1-frontend-design Figure 2-2(1). Design Overview

The Web UI layouts

When the user switches on the Custom Template, the Low-code k8s Job fields should hide and YAML editor should appear (or vice versa), similar to the JSON Custom Template in the DataX plugin.

This feature, as shown in Figure 2-2(1), is implemented using the Vue component span, which is controlled by reactive variables (such as yamlEditorSpan) in the file dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-k8s.ts.
The Request body

When the user switches to Custom Template mode, the request body should include only YAML-related fields (customConfig and yamlContent), and all previously hidden fields should not be sent.

This feature is implemented using the taskParams in the file dolphinscheduler-ui/src/views/projects/task/components/node/format-data.ts
i18n/locales

Apache DolphinScheduler is an international software and should support multiple languages.

The text on the Web UI are retrieved from variables defined in the file dolphinscheduler-ui/src/locales/{en_US, zh_CN}/project.ts. And for user-customized k8s YAML tasks, there are three key variables to consider:
- k8s_custom_template: the label for the switch to enable user-customized k8s YAML tasks.
- k8s_yaml_template: the label for the text editor used to input user YAML.
- k8s_yaml_empty_tips: the warning message displayed when a user tries to submit empty YAML
This feature is implemented by invoking t('project.node.${variable_name}') (such as t('project.node.k8s_yaml_template')) in the file dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-k8s.ts.

2.3 Backend Design

The backend design describes the process of how the worker executes user-customized k8s YAML tasks. As shown in Figure 2-3(1), we can see how user-customized k8s YAML Pod tasks are related to the original k8s low-code jobs.

2-3-1-backend-design-overview Figure 2-3(1). Backend Design Overview

After the worker checks the parameters, K8sYamlTaskExecutor is loaded for the current user-customized k8s YAML Pod task. Once the YAML is parsed into HasMetadata, its kind field is used to assign abstractK8sOperation as K8sPodOperation for executing the YAML Pod task.

K8s Task Executors

Figure 2-3(2). K8s Task Executors

Three k8s task executor are involved, as shown in Figure 2-3(2):
- AbstractK8sTaskExecutor is an abstract class that represents a k8s task executor.
- K8sTaskExecutor is a concrete class that extends AbstractK8sTaskExecutor to represent a low-code executor
- K8sYamlTaskExecutor is a concrete class that extends AbstractK8sTaskExecutor to represent a user-customized k8s YAML task executor.
K8s Operation handler

Figure 2-3(3). K8s Operation Handlers

Two operation handlers are involved, as shown in Figure 2-3(3):
- AbstractK8sOperation is an interface representing all k8s resource operations.
- K8sPodOperation is a concrete class that implements AbstractK8sOperation to handle Pod operations

2.4 Usecase Design

A typical use case for a k8s YAML task includes uploading YAML, online workflows, and starting workflows, similar to k8s low-code jobs, unless users switch to the Custom Template option to fill in YAML.

2-4-1-usecase-design Figure 2-4(1). Usecase Design

The user edits a k8s YAML node in a workflow
If the Custom Template is activated and YAML content is not blank, the user may online this whole workflow
If the workflow is online, the user may start the workflow and review the logs generated during the execution of the workflow.

Compatibility, Deprecation, and Migration Plan

3.1 Compatibility Plan

The user-customized k8s YAML feature requires only customConfig to be activated, By default, the value is 0, which applies to the existing k8s low-code jobs.

The remainder of this section will demonstrate the flexibility and compatibility of this design by using the example of introducing Configmaps:

    this.k8sYamlType = K8sYamlType.valueOf(this.metadata.getKind());
    generateOperation();

After parsing with YamlUtils::load, the kind field acquired by this.metadata.getKind() will be ConfigMaps. Then, this.k8sYamlType is determined and used to generate the corresponding operations:

    private void generateOperation() {
        switch (k8sYamlType) {
            case Pod:
                abstractK8sOperation = new K8sPodOperation(k8sUtils.getClient());
                break;
            case ConfigMaps:
                abstractK8sOperation = new K8sConfigmapsOperation(k8sUtils.getClient());
                break;
            default:
                throw new TaskException(
                        String.format("K8sYamlTaskExecutor do not support type %s", k8sYamlType.name()));
        }
    }

Consequently, generateOperation() will set this.abstractK8sOperation to a new instance of K8sConfigmapsOperation. Next, we can implement K8sConfigmapsOperation to handle the ConfigMaps operations.

3.2 Deprecation Plan

N/A for now, waiting for community opinions.

3.3 Migration Plan

N/A for now, waiting for community opinions.

Test Plan

4.1 Overview

The User-customized k8s YAML task feature allows users to submit YAML task to k8s, including Pod, ConfigMaps, and other resources.

This test plan aims to ensure that the feature functions as expected and meets user requirements.

4.2 Scope

YAML Pod

Test Case #	Name	Action	Expectation
1	UI Display	Edit YAML, save and reopen	The YAML content stays up-to-date.
2	UI Validation	try to submit empty YAML	The UI modal dialog intercepts empty YAML.
3	Online Workflow	Save workflow, and online	The User successfully brings the workflow online.
4	Dryrun Workflow	Run workflow as dryrun mode	The Master successfully dry runs this task.
5	Test Workflow	Run workflow as test mode	The Worker successfully tests this task.
6	Run Workflow	Run workflow	The Worker successfully runs this task.

Code of Conduct

[X] I agree to follow this project's Code of Conduct

SbloodyS commented 3 weeks ago

cc @Gallardot @ruanwenjun

fuchanghai commented 3 weeks ago

@caishunfeng pls help to add this issue to #14102

SbloodyS commented 3 weeks ago

@caishunfeng pls help to add this issue to #14102

Done. You're also DS Committer and have permission to add to it.

Gallardot commented 3 weeks ago

Before discussing this DSIP, I hope everyone can reach a basic consensus. Supporting customization can indeed meet more demand scenarios, but excessive customization can bring more problems.

I see in the design that it supports users to directly create pod and configmap, and even supports creating multiple POD.

Regarding the support for configmap, I have some questions:

Why support configmap? For the same workflow, does it create a configmap for each task instance? Is the content in the configmap different each time? If it is the same, why create it each time? As a configuration resource in k8s, shouldn't configmap be static? As a way to obtain configuration, besides configmap, should secret also be supported?
Should the configmap be mounted to the pod as a file? If so, should PV and PVC be supported?
If it is just to reference the configuration in the configmap, can it be directly referenced through env?

Regarding the support for pod, I have some questions:

How is the name of the pod defined? How can different workflow in the same namespace ensure that pod names do not duplicate? This is also the case with configmap.
How is the lifecycle of the pod managed? Will DS delete it after the task ends? How to ensure that DS can definitely delete it?
If the execution strategy of the workflow is parallel, how should the pod be handled?
If multiple pods are created at the same time, are these pods related? Or is it just to run multiple pods concurrently? If it is concurrent, does it support deployments? Does it support StatefulSet? Should DS manage it as a controller of k8s resources? I am afraid this is not what DS should do.
Or more broadly, do you want to support the task of creating helm charts?
How to retrieve the logs of a pod? How to retrieve the logs of multiple pods? If there are multiple containers in a pod, how to retrieve the logs of multiple containers?

If the issues are not adequately addressed, I am afraid I will vote -1 on this DSIP.

fuchanghai commented 3 weeks ago

For each type, we can set a strategy. The first strategy is to ignore if it exists, and the second strategy is to delete first and then add to meet various scenarios.
Add two labels to the pod according to the strategy type. If it is an ignore if it exists strategy, use taskCode as the label. If it is a delete first and then add strategy, use taskInstanceId.
Delete the pod according to the label

ObjectMeta.setLabel

We can also use taskInstance to replace the user-defined pod name
```
ObjectMeta.setName
```
5.Perhaps this issue can be targeted at a single pod, without considering multiple pods. In fact, if there are multiple pods in a node, we can give multiple pods a label with the value of processInstanceId+taskInstanceId, and obtain multiple pods through processInstance and obtain logs separately.

@Gallardot cc @EricGao888 @Mighten @SbloodyS WDYT?

SbloodyS commented 3 weeks ago

Totally agreed with @Gallardot

From my persional perspective, since DS is a scheduling system. The current k8s task is mainly used to replace the cron-job of k8s. And we have no plans to support k8s deployment scheduling management since this maintenance will involve a huge amount of work. So we need to reach a basic consensus.

fuchanghai commented 3 weeks ago

This is indeed too big. At present, the most commonly used in our company are configMap and pod types. Deployments are only used when using flink. For SaaS type products.ConfigMap is usually initiated by users or when users modify their own configurations. In most cases, pod type is the most commonly used type. We can open an issue only for pods and first discuss how to complete type of pods. @Mighten cc @Gallardot @SbloodyS

fuchanghai commented 3 weeks ago

For the scenario of a pod with multiple containers, I think it is necessary to divide the logs by container. When querying the logs, the front end needs to pass the container name to check the logs of the specific container. The front end needs to make a table to switch containers to view the logs. This transformation is a bit much. I hope that this issue will only consider a single pod and a single container.

Gallardot commented 3 weeks ago

This is indeed too big. At present, the most commonly used in our company are configMap and pod types. Deployments are only used when using flink. For SaaS type products.ConfigMap is usually initiated by users or when users modify their own configurations. In most cases, pod type is the most commonly used type. We can open an issue only for pods and first discuss how to complete type of pods.

@Mighten cc @Gallardot @SbloodyS

I’m sorry, but I don’t agree with this view. Pods are the most commonly used because they are the basic unit of service workload. But they are also the least used since only early versions of Kubernetes directly used pods. That’s why more advanced workload like Deployments and StatefulSets were introduced later. Managing the lifecycle of pods is an important task in Kubernetes, not just creating a pod.

fuchanghai commented 3 weeks ago

From the current low-code functions of k8s Task, it is to put a pod in a job type task, which is not much different from a single pod task.

apache / dolphinscheduler