GoogleCloudPlatform / zetasql-toolkit

The ZetaSQL Toolkit is a library that helps users use ZetaSQL Java API to perform SQL analysis for multiple query engines, including BigQuery and Cloud Spanner.
Apache License 2.0
38 stars 9 forks source link

Consider making ParentColumnFinder public? #28

Closed dion-ricky closed 10 months ago

dion-ricky commented 10 months ago

Hi,

My company's ETL pipeline heavily utilizes write disposition and table destination feature in BigQuery. So the majority of our query is actually just a select statement. However, the ColumnLineageExtractor class provided by the zetasql-toolkit-core doesn't cover our use cases.

I'm currently writing a custom column lineage extractor to support select statement when I realized that ParentColumnFinder class is not public. I needed the exact functionality that this class does. If it was set to package private by design could you give me suggestion what I should do.

Thanks

ppaglilla commented 10 months ago

I think we can do two things.

On one hand, add a nice method to ColumnLineageExtractor that would take your analyzed SELECT and your output table. It'd build the lineage and return it, since it already has the machinery to do so. Something like this:

public Set<ColumnLineage> extractColumnLevelLineage(
    ResolvedQueryStmt statement, String outputTable) {

    List<ResolvedOutputColumn> outputColumns = statement.getOutputColumnList();

    return extractColumnLevelLineageForOutputColumns(outputTable, outputColumns, statement);
}

On the other hand, I would still make ParentColumnFinder public. It sounds like it could be generally useful for any case that the lineage extractor doesn't cover.

I'll add these changes in the next couple of days (unless you'd like to submit a PR yourself). Then we'll release it on v0.5.0, which I'm aiming for early next week.

dion-ricky commented 10 months ago

I have opened a pull request #29 to resolve this issue. Thanks for your recommendation of overloading the extractColumnLevelLineage to ColumnLineageExtractor, I have also added that into the PR.

ppaglilla commented 10 months ago

Thank you very much! I'll review later today.

ppaglilla commented 10 months ago

Merged on #29