Closed dion-ricky closed 1 year ago
I think we can do two things.
On one hand, add a nice method to ColumnLineageExtractor
that would take your analyzed SELECT
and your output table. It'd build the lineage and return it, since it already has the machinery to do so. Something like this:
public Set<ColumnLineage> extractColumnLevelLineage(
ResolvedQueryStmt statement, String outputTable) {
List<ResolvedOutputColumn> outputColumns = statement.getOutputColumnList();
return extractColumnLevelLineageForOutputColumns(outputTable, outputColumns, statement);
}
On the other hand, I would still make ParentColumnFinder
public. It sounds like it could be generally useful for any case that the lineage extractor doesn't cover.
I'll add these changes in the next couple of days (unless you'd like to submit a PR yourself). Then we'll release it on v0.5.0
, which I'm aiming for early next week.
I have opened a pull request #29 to resolve this issue. Thanks for your recommendation of overloading the extractColumnLevelLineage to ColumnLineageExtractor, I have also added that into the PR.
Thank you very much! I'll review later today.
Merged on #29
Hi,
My company's ETL pipeline heavily utilizes write disposition and table destination feature in BigQuery. So the majority of our query is actually just a select statement. However, the
ColumnLineageExtractor
class provided by the zetasql-toolkit-core doesn't cover our use cases.I'm currently writing a custom column lineage extractor to support select statement when I realized that
ParentColumnFinder
class is not public. I needed the exact functionality that this class does. If it was set to package private by design could you give me suggestion what I should do.Thanks