Closed kissycn closed 4 months ago
Would you like to implement this feature?
@davidzollo
Yes I will try to complete it. Please assign the task to me.
Please provide full design details before you coding. @kissycn
Very -1 to this plugin.
If you only want to manage the flink job, use flink origin way is more convenient, and ds won't do better then flink portal.
Previously DS has try to support Flink/Spark streaming, but all of these plugins are demo and going to shit, even cannot run hello world now, and missing complete lifecycle management.
Motivation In fact, the motivation for developing this plug-in in DS is very simple. DS is used as the entry point for all engines, and data synchronization, batch processing, and stream processing are uniformly managed by DS.
Problem: FlinkCDC submit tasks is:
bash ${FLINK_CDC_HOME}/bin/flink-cdc.sh mysql-to-doris.yaml
Currently, flink-cdc.sh does not support task cancellation, which requires the use of Flink's API. Therefore, if you want to develop this plug-in in DS, use the FlinkCDC handle shell script to submit tasks, and cancel using: Flink API to cancel.
Solution: As @ruanwenjun said, there are some problems with the lifecycle management of some streaming tasks. From the source code, we know that after the Flink task is submitted, the shell process will always hold, and the Flink task cannot be cancel. For the above problems, I think there may be the following solutions:
After comprehensive consideration, I decided to submit it using Flink Rest API. I will close the issue first and discuss it again if there is any new progress.
Search before asking
Description
Apache DolphinScheduler is a distributed workflow task scheduling system that supports various types of data workflow tasks. Currently, DolphinScheduler supports basic Flink tasks, but is not yet able to directly support FlinkCDC tasks. FlinkCDC is a library commonly used for real-time data synchronization and change data capture (CDC), which is particularly important for modern data architecture. Currently, FlinkCDC3.0 already supports submitting CDC tasks in the Pipeline mode, which will be very suitable for submitting tasks as a Task plug-in for DolphinScheduler.
Use case
In the task submission page, in addition to the dolphinscheduler basic parameters, you can submit the Flink-CDC Pipeline yaml declaration file
Related issues
No response
Are you willing to submit a PR?
Code of Conduct