MarquezProject / marquez-web

Marquez Web UI
22 stars 6 forks source link

Proposal: Data lineage graph #53

Closed wslulciuc closed 4 years ago

wslulciuc commented 4 years ago

The lineage graph maintained by Marquez for datasets and jobs can be a powerful tool when debugging a failing job or tracking down data quality issues. But, the current lineage graph (see below), can be overwhelming:

Screen Shot 2020-01-22 at 7 41 54 PM

Now, there are benefits to displaying the complete lineage graph, but only in the case when full context is required (maybe we'll eventually provide such an option). Often, a team may only care about the datasets and jobs they own, and explore upstream or downstream dependencies when needed.

I propose the following as we iterate on the way the UI visualizes lineage metadata:

/cc @julienledem

julienledem commented 4 years ago

Thanks Willy for the context, Another aspect is that the current force directed simulation may not be the best way to lay out the graph. Since the lineage graph is a Directed Acyclic Graph, layouts meant for those would be better IMO. Here is an example I found interesting https://github.com/erikbrinkman/d3-dag Keeping the current design and symbols (squares for datasets and circles for jobs) a left-to-right or top-down layout that reflects the direction of dependencies would be useful. Possibly using the bright white/grayed out convention to represent what's in or out of the namespace of the current job/dataset.

jhubley commented 4 years ago

Hi Willy and Julien,

Thanks for the comments and ideas. @julienledem - d3-dag is great. I hadn't seen it before. Both @grantdfoster and I agree it's the way to go. We're going to fork one of the examples and start looking at what we need to do to change our graph.

@wslulciuc - regarding the points you make above:

  1. Limited Context: @grantdfoster and I were discussing this and think it actually does makes sense for the lineage graph to persist, but not to show everything. What we were thinking is that the data shown in the network would reflect the data shown below.
  1. Exploration: Agree, there should definitely be a way for a user to explore. Maybe this could be an expand icon on the lineage graph that takes the user to a full-screen view of the graph, where they have options for exploration?

  2. Interactive: Yes, totally agree with this and I believe this has been our plan from the beginning. This shouldn't be too much work to implement and we'll plan on doing this soon. I'll open an issue for it!

wslulciuc commented 4 years ago
  1. Exploration: Agree, there should definitely be a way for a user to explore. Maybe this could be an expand icon on the lineage graph that takes the user to a full-screen view of the graph, where they have options for exploration?

For exploration, something like the visual below would be sweet :)

D3 Tree

http://mbostock.github.io/d3/talk/20111018/tree.html

wslulciuc commented 4 years ago

... We were thinking this could be datasets/jobs with the most edges.

Yep, exactly 💯

grantdfoster commented 4 years ago

@wslulciuc we've been messing around with D3 hierarchy which achieves similar results! Will look into what you posted further and see if we can integrate!

image

wslulciuc commented 4 years ago

@grantdfoster Very cool! The visual improvements will provide a more intuitive look / feel to the lineage graph. Thanks for the update!

jhubley commented 4 years ago

... We were thinking this could be datasets/jobs with the most edges.

Yep, exactly 💯

Cool, I opened an issue and will have this done in the next day or so.

wslulciuc commented 4 years ago

Great work @grantdfoster and @jhubley to get this feature merged in master! #65