Open Chr96er opened 1 year ago
You bring up a very good point, thank you for reporting this!
I originally envisioned the column lineage feature to focus only on columns that were used to write to others directly. But now that you bring this discussion up; I agree that we should have visibility on filters, groups, etc.
I'll need to look into it a bit to give you a timeline. My first impression is that it'd require decent refactor of how lineage is currently implemented. But I'd like to have this as well!
Thanks, would be amazing to have that feature!
The current implementation of column lineage extraction does not address real world applications as far as I can tell.
The two most common applications I can think of are: 1) Impact analysis when changing definition of columns or considering deletion 2) Understanding definition of columns
Both of these applications require extracting indirect column dependencies from e.g.
WHERE
,GROUP BY
,IF
andCASE WHEN
clauses. As far as I can tell this is not possible at the moment (unless I'm missing a configuration option).Example 1) (this is actually from the
ExtractColumnLevelLineage.java
examples):extracts:
when I would like to see:
The * indicates an indirect dependency (of course the output is generated by the user, so zetasql would have to provide some attribute for indirect relationships). Both columns indirectly depend on
corpus
andtitle
. The definition ofcomment
changes, if we make changes to eithercorpus
ortitle
in upstream tables. I would like to be aware of that when I make changes to those columns.Another two made up examples: Example 2)
GROUP BY
:where I would like to see:
Example 3)
IF
:where I would like to see:
Please add configuration options for detecting these indirect dependencies, otherwise I don't see anyone adapting column level lineage which could be extremely powerful.