Today's Marquez UI displays lineage graph based on either the dataset or job chosen in dataset or job list. While this is good for a general use, there are cases where we would like to see all the lineage graphs connected to a particular namespace. For example, we may want to take a look at all the lineage graphs under a particular namespace given, whether the dataset or jobs are connected together or not. This may be helpful when looking for a particular dataset or job that are dangling (meaning having no child or parent), or disconnected to each other, forming multiple separate graphs of each. Currently, there is no way to display it in current lineage graph.
Solution
Marques should have a separate graph mode that would collect all the lineages under a certain namespace, say namespace abc, and draw all of the dataset and job lineage graphs all together. This includes all the dangling datasets and jobs that are not connected as well, so that everything belonging to a particular namespace are all displayed.
Based on @howardyoo 's proposal adding a visualization. I think this is a good idea, and would give better experience of navigating to lineage, delete.
Problem
Today's Marquez UI displays lineage graph based on either the dataset or job chosen in dataset or job list. While this is good for a general use, there are cases where we would like to see all the lineage graphs connected to a particular
namespace
. For example, we may want to take a look at all the lineage graphs under a particular namespace given, whether the dataset or jobs are connected together or not. This may be helpful when looking for a particular dataset or job that are dangling (meaning having no child or parent), or disconnected to each other, forming multiple separate graphs of each. Currently, there is no way to display it in current lineage graph.Solution
Marques should have a separate graph mode that would collect all the lineages under a certain namespace, say namespace
abc
, and draw all of the dataset and job lineage graphs all together. This includes all the dangling datasets and jobs that are not connected as well, so that everything belonging to a particular namespace are all displayed.