Task plugin research, out of solution(任务插件化调研,出解决方案)

Baoqi commented 5 years ago

I saw a new project today, this code is relatively simple, which also implements the basic plugin, the code is quite clear. You can refer to it: (This is much less than the StreamSets code, it is much simpler)

Reference: https://github.com/harbby/sylph Stream computing platform for bigdata

The plugin uses the com.github.harbby.gadtry.classloader.DirClassLoader of https://github.com/harbby/gadtry to load plugins

Plugin load code: https://github.com/harbby/sylph/blob/master/sylph-main/src/main/java/ideal/sylph/main/service/PipelinePluginLoader.java

Plugin implementation code reference: ClickHouseSink: https://github.com/harbby/sylph/blob/master/sylph-connectors/sylph-clickhouse/src/main/java/ideal/sylph/plugins/clickhouse/ClickHouseSink.java    - Express the ClickHouseSink of type RealTimeSink by annotating Name and Description.    - Defined various parameters by extending PluginConfig's ClickHouseSinkConfig: jdbcUrl, user, password, query, bulkSize (bulkSize is int type, others are string), each parameter has Name and Description    - Support for defining multiple TaskTypes in a Plugin Jar. TaskType is divided into 3 categories: Source, Transform, Sink    - NOTE: For EasyScheudler, you should also consider I18N, so Name and Description should also let the plugin display multiple languages.    - NOTE: For EasyScheduler, there are many types of plugins, such as: Task type (sql task, shell task, etc.), JDBC Connector plugin (mysql, clickhouse, etc.)

The front end sets the plugin parameter: (This is because there is no local compilation/deployment, you can only look at the code). This code is in: https://github.com/harbby/sylph/blob/master/sylph-controller/src/main/webapp/app/js/etl.js#L27

But it just serializes the plugin's config into a json object, and then in the text box, the user modifies the specific parameters.   - NOTE: For EasyScheduler, we should be able to provide different UI presentations depending on the config type.   - NOTE: For EasyScheduler, we should be able to put various configs in different TabGroups, and then put each config into a different TAB, such as: basic information, "JDBC information", "advanced configuration", etc. "

今天看到了一个一个新的项目, 这个代码比较简单, 其中也实现了基本的plugin, 代码写的还挺清楚的. 可以参考一下: (这个比StreamSets的代码少太多了, 也简单很多)

参考了: https://github.com/harbby/sylph Stream computing platform for bigdata

插件使用了 https://github.com/harbby/gadtry 的 com.github.harbby.gadtry.classloader.DirClassLoader 来load plugins

plugin加载代码: https://github.com/harbby/sylph/blob/master/sylph-main/src/main/java/ideal/sylph/main/service/PipelinePluginLoader.java

plugin实现代码参考: ClickHouseSink: https://github.com/harbby/sylph/blob/master/sylph-connectors/sylph-clickhouse/src/main/java/ideal/sylph/plugins/clickhouse/ClickHouseSink.java

前端设置plugin参数: (这个由于没有在本地编译/部署, 只能大概看看代码). 这个代码在: https://github.com/harbby/sylph/blob/master/sylph-controller/src/main/webapp/app/js/etl.js#L27

但是它只是把plugin的config 序列化为一个json object, 然后在文本框中, 用户自己修改具体参数.

EricJoy2048 commented 5 years ago




1. XXXTask是基于AbstractTask的实现类,实现了Task的相关接口,系统在执行该任务时实际上是通过反射来实例化该任务,然后调用执行方法。

2. configuration.xml文件主要用来描述该自定义任务类型的一些自定义的配置参数。注意,每个任务都会有一些通用参数,比如任务的名称,是否可重试,失败是否要告警,使用的资源队列等。这些参数是整个系统级别的,自定义任务是无法对这些参数做修改和定义的。




task_type_name : 自定义任务的名称,比如MR任务,SPARK任务等,这个名称会显示在流程定义是左侧可选择的任务类型列表中。

classpath : 自定义任务的实现类的路径,任务的执行器会使用该路径反射实例化具体的实现类,然后运行该实现类中的run方法。


name : 参数的名称 value : 参数的默认值 type : 参数的类型,可以为INPUT,INPUT_LIST,SELECT,TEXTAREA,KV,RADIO.

INPUT:前端页面会将INPUT类型的参数以表单元素input渲染。 INPUT_LIST:前端页面会先渲染一个input,然后放上一个“+”号,点击加号后可以添加多个input。 KV:前端会渲染出左右两个input,左边是key,右边是该key的值,并且可以通过"+"号增加kv对。




ID 任务类型名称 任务的classpath 任务的自定义参数






ID 任务类型ID 任务节点名称 任务参数




khadgarmage commented 5 years ago

@gaojun2048 +1, I agree with your design. I also think of a few additional points here:

  1. Task plug-in development, there are two points to consider, one is a plug-in that allows users to directly develop their own business needs, and the use is more flexible; the other is to allow open source enthusiasts to develop third-party plug-ins, so that will promote the development of dolphinscheduler, ideas are more divergent, is there a plug-in market? Therefore, based on the function of the custom task plug-in, it is necessary to support the third-party plug-in import. After the import and the custom task plug-in is a process, the workload itself is not large, but the import is added on the basis of the original.
  2. Plug-in development does not depend on java, can be any language, executable program, the scheduling platform does is transparent transmission parameters, execution and scheduling. Developers can have more choices. For example, a shell script can be packaged into a plugin using some parameters. For example, someone can write an executable file in golang, define some parameters, to make a plugin.
  3. System upgrade or migration is also a point to note, to ensure that the plug-in will still take effect after migration or upgrade.

    @gaojun2048 +1 很赞同你的需求,我这边还想到几个补充的点: 1、任务插件化开发,可以有两点考虑,一个是让使用者直接可以开发自己业务需求的插件,使用上更加灵活;另一个是可以让开源爱好者开发第三方插件,这样对ds更是一种正向的推动,思路更发散点,是不是可以有插件market。 所以在自定义任务插件的功能基础上,要支持第三方插件导入,导入后和自定义任务插件是一个流程,这个工作量本身并不大,只是在原来的基础上增加了导入。 2、自定义任务插件,可以不局限于jar包,可以是shell, python, 也可以是一个可执行文件,可以有更多的选择。比如说一个shell脚本使用一些参数,就可以封装成一个插件;同时也可以用golang写一个可执行文件,传到服务器,定义一些入参,也可以作为一个插件。 3、系统升级或者迁移也是个要注意的点,要保证迁移或者升级后,插件依然能够生效。

davidzollo commented 4 years ago

this feature please referer https://github.com/apache/incubator-dolphinscheduler/issues/2869

davidzollo commented 4 years ago

