apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.4k stars 4.5k forks source link

Lineage data kinship #249

Open boandai opened 5 years ago

boandai commented 5 years ago

if anybody else also need this feature , please reply "+1", thx

finally, we will decide which version to implement this feature based on the level of attention.

liuminyt commented 5 years ago

+1

liuGangBestSkip commented 5 years ago

+1

thisnew commented 5 years ago

+1

zhanghuidouble commented 5 years ago

+1

if anybody else also need this feature , please reply "+1", thx

finally, we will decide which version to implement this feature based on the level of attention.

wuchch commented 5 years ago

+1

moranrr commented 5 years ago

+1

ITriangle commented 4 years ago

+1

WangyanxuFp commented 4 years ago

+1111111111

coding-now commented 4 years ago

+1

lianayu commented 4 years ago

+1

lipengyu commented 4 years ago

+1

wangsvip commented 4 years ago

+1

simon824 commented 4 years ago

+1

yh2388 commented 4 years ago

+1

7eng commented 4 years ago

+1

feiyalun commented 4 years ago

+1

wen-hemin commented 4 years ago

I plan to implement data lineage function (Table level).

Already start a discussion in mail list.


E.g. image image image

simon824 commented 4 years ago

Will this design cause of loops?

------------------ 原始邮件 ------------------ 发件人: "Rubik-W"<notifications@github.com>; 发送时间: 2020年4月15日(星期三) 下午2:58 收件人: "apache/incubator-dolphinscheduler"<incubator-dolphinscheduler@noreply.github.com>; 抄送: "simon"<3656562@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [apache/incubator-dolphinscheduler] Lineage data kinship(Lineage数据血缘关系) (#249)

I plan to implement data lineage function (Table level).

The sql node and etl node automatically parse the dependency table and target table.

The frontend controls whether to enable dependency detection through switch.

The master server automatically injects dependent nodes, create dependent nodes based on dependencies.

Rely on the node to set the default number of retries.

Open the node that dependent detection function, no longer need to manually connect.

Already start a discussion in mail list.

E.g.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

wen-hemin commented 4 years ago

Won't loops, Current workflow does not support circular relationships. Checked when saving workflow.

s751167520 commented 4 years ago

+1 Can the blood relationship of the table field be resolved ?

wen-hemin commented 4 years ago

The current design is to solve the blood relationship between the tables.

gzvince commented 4 years ago

+1

wacr2008 commented 4 years ago

+1

kangjunxiang commented 4 years ago

+1

yeyudefeng commented 3 years ago

+111111111111111

jetaime-chen commented 3 years ago

+1

geosmart commented 3 years ago

if anybody else also need this feature , please reply "+1", thx

finally, we will decide which version to implement this feature based on the level of attention.

+1

geosmart commented 3 years ago

just like airflow ,task have inlets and outlets attribute, user can define the inlets and outlets manualy; and after task finish ,the inlets/outlets will send to metadata server(e.g atlas)

how can I impl this in dolphinscheduler? @dailidong

liangpingliu commented 3 years ago

It has the necessary functions

dengc367 commented 3 years ago

+1

shitoubiao commented 3 years ago

+1

yh2388 commented 3 years ago

I plan to implement data lineage function (Table level).

  • The sql node and etl node automatically parse the dependency table and target table.
  • The frontend controls whether to enable dependency detection through switch.
  • The master server automatically injects dependent nodes, create dependent nodes based on dependencies.
  • Rely on the node to set the default number of retries.
  • Open the node that dependent detection function, no longer need to manually connect.

Already start a discussion in mail list.

  • 支持sql节点和etl节点自动分析表之间的依赖关系,通过解析sql的select表和insert表实现,其他节点可以手工维护insert表(如果存在这种需求)
  • 前端通过依赖检测开关控制master在任务调度时是否进行依赖解析
  • master server自动注入依赖节点,运行时根据依赖关系生成虚拟依赖节点,不修改工作流定义数据,仅针对定时调度上线的工作流进行依赖分析
  • 生成的依赖节点设置默认失败重试次数,比如每5分钟检测一下
  • 打开依赖解析开关后,节点间不再需要手工连线,master根据依赖关系的顺序进行节点的调度

E.g. image image image

Hi @Rubik-W , I am testing your implement, How to set process_definition_json from frontend, Should you show me a sample?

wen-hemin commented 3 years ago

I plan to implement data lineage function (Table level).

  • The sql node and etl node automatically parse the dependency table and target table.
  • The frontend controls whether to enable dependency detection through switch.
  • The master server automatically injects dependent nodes, create dependent nodes based on dependencies.
  • Rely on the node to set the default number of retries.
  • Open the node that dependent detection function, no longer need to manually connect.

Already start a discussion in mail list.

  • 支持sql节点和etl节点自动分析表之间的依赖关系,通过解析sql的select表和insert表实现,其他节点可以手工维护insert表(如果存在这种需求)
  • 前端通过依赖检测开关控制master在任务调度时是否进行依赖解析
  • master server自动注入依赖节点,运行时根据依赖关系生成虚拟依赖节点,不修改工作流定义数据,仅针对定时调度上线的工作流进行依赖分析
  • 生成的依赖节点设置默认失败重试次数,比如每5分钟检测一下
  • 打开依赖解析开关后,节点间不再需要手工连线,master根据依赖关系的顺序进行节点的调度

E.g. image image image

Hi @Rubik-W , I am testing your implement, How to set process_definition_json from frontend, Should you show me a sample?

When the front-end creates a workflow and clicks save, it is automatically generated according to the task configuration information.

BobbySun commented 3 years ago

+1, 这个功能很需要,我们还准备开发这样的功能呢

ljp510016132 commented 3 years ago

+1 急需

YiAnCN commented 3 years ago

+1急需此功能

alexcd90 commented 3 years ago

+1急需此功能

lyyprean commented 3 years ago

No description provided.

+1 +1 +1 急需此功能

ricemouse commented 3 years ago

+1

glxfeng commented 3 years ago

+1

ntupapaya commented 3 years ago

support +1

brucemen711 commented 3 years ago

+1 I think we should not embedded with dolphin system. We could intro new module/plugin, so we could gen and put standard data lineage to other systems like apache atlas, marquez(Wework), ... @dailidong @boandai

irhawks commented 3 years ago

+1

jczk commented 3 years ago

+1

zhujian86 commented 3 years ago

+1

yimaixinchen commented 3 years ago

+1

jon-qj commented 3 years ago

+1

huanzui commented 2 years ago

+1

Level1Accelerator commented 2 years ago

这个需求很重要,+1