MarquezProject / marquez-web

Marquez Web UI
23 stars 6 forks source link

Missing input / out dependencies in lineage graph #68

Closed wslulciuc closed 3 years ago

wslulciuc commented 4 years ago

Below I have captured feedback on how lineage metadata is visualized:

@julienledem

The graph assumes a tree structure rather than a DAG. If there's a diamond ( for example job J1 produces dataset DS1 and DS2 and job2 consumes datasets DS1 and DS2, then it duplicates those two branches. You can see this live on willy's branch: https://github.com/MarquezProject/marquez-web/tree/feature/example-food-delivery J1 is etl_menus and J2 is orders_7_days you can see the graph downstream from etl_menus being duplicated instead of joining back on orders_7_days.

@jharris126 (see original feedback on Gitter)

I was wondering if someone can take a look at the screenshot below [1] and tell me if I'm doing something wrong or if this is a bug. Only the first input on my input array for this job is showing up in the lineage chart even though the dataset shows up in the search and the marquez postgres repo seems to have both inputs linked correctly. [1] https://files.gitter.im/marquez-project/community/Luxu/Marquez-Lineage-Issue.png

wslulciuc commented 4 years ago

@grantdfoster Would be great to get your thoughts on the feedback above from @jharris126. Also, as we discussed offline, I opened #79 to update the tree structure used to display the lineage graph.