apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.8k stars 4.6k forks source link

[Improvement][Docker/K8s] Task support matrix and solution on docker/k8s #5068

Closed chengshiwen closed 3 years ago

chengshiwen commented 3 years ago

Describe the question

Task support matrix and solution on docker/k8s

In initial docker/kubernetes deployment, some tasks cannot be directly supported like flink in issue #5065.

So we should give a task support matrix and solution on docker/k8s, such as hadoop (hdfs/yarn), spark, flink, hive ant etc.

What are the current deficiencies and the benefits of improvement

Which version of DolphinScheduler: -[1.3.x] -[dev]

haydenzhourepo commented 3 years ago

can you leave a brief introduction of those steps, I want to have a try as soon as possible. thanks. 😄

chengshiwen commented 3 years ago

@haydenzhourepo You can offer some necessary tasks, I will reply you as soon as possible

haydenzhourepo commented 3 years ago
  1. create a custom DS image with flink libs
  2. configure flink cluster
  3. submit a flink stream job with native k8s and application mode.
  4. submit a flink batch job with native k8s and application mode per hour.
  5. checkout the status and metrics of those jobs
chengshiwen commented 3 years ago

At present, Flink tasks do not support Generic CLI mode. This means that application mode (including kubernetes-application) is not supported. This feature will be implemented in the next version (not 1.3.6).

haydenzhourepo commented 3 years ago

what resource manager and submit mode DS support currently

chengshiwen commented 3 years ago

Local cluster, standalone cluster and YARN per job (yarn-cluster mode)

chengshiwen commented 3 years ago

DolphinScheduler Support Matrix on Docker/Kubernetes

Type Support Notes
Shell Yes
Python2 Yes
Python3 Indirect Yes Refer to FAQ
Hadoop2 Indirect Yes Refer to FAQ
Hadoop3 Not Sure Not tested
Spark-Local(client) Indirect Yes Refer to FAQ
Spark-YARN(cluster) Indirect Yes Refer to FAQ
Spark-Mesos(cluster) Not Yet
Spark-Standalone(cluster) Not Yet
Spark-Kubernetes(cluster) Not Yet
Flink-Local(local>=1.11) Not Yet Generic CLI mode is not yet supported
Flink-YARN(yarn-cluster) Indirect Yes Refer to FAQ
Flink-YARN(yarn-session/yarn-per-job/yarn-application>=1.11) Not Yet Generic CLI mode is not yet supported
Flink-Mesos(default) Not Yet
Flink-Mesos(remote>=1.11) Not Yet Generic CLI mode is not yet supported
Flink-Standalone(default) Not Yet
Flink-Standalone(remote>=1.11) Not Yet Generic CLI mode is not yet supported
Flink-Kubernetes(default) Not Yet
Flink-Kubernetes(remote>=1.11) Not Yet Generic CLI mode is not yet supported
Flink-NativeKubernetes(kubernetes-session/application>=1.11) Not Yet Generic CLI mode is not yet supported
MapReduce Indirect Yes Refer to FAQ
Kerberos Indirect Yes Refer to FAQ
HTTP Yes
DataX Indirect Yes Refer to FAQ
Sqoop Indirect Yes Refer to FAQ
SQL-MySQL Indirect Yes Refer to FAQ
SQL-PostgreSQL Yes
SQL-Hive Indirect Yes Refer to FAQ
SQL-Spark Indirect Yes Refer to FAQ
SQL-ClickHouse Indirect Yes Refer to FAQ
SQL-Oracle Indirect Yes Refer to FAQ
SQL-SQLServer Indirect Yes Refer to FAQ
SQL-DB2 Indirect Yes Refer to FAQ
chengshiwen commented 3 years ago

close by #5158