The ZetaSQL Toolkit is a library that helps users use ZetaSQL Java API to perform SQL analysis for multiple query engines, including BigQuery and Cloud Spanner.
Apache License 2.0
39
stars
10
forks
source link
Resolved column no longer have project id or dataset id since 0.5.0 #67
Resolved column lineage doesn't have project id and dataset id beginning with v0.5.0.
Try to execute this example code from README using v0.4.1 and v0.5.0 then compare both output, you'll notice that the project id and dataset id that are present in 0.4.1 is missing in 0.5.0.
My questions are:
Is this intentional and what is the goal?
If so, how can I get project id and dataset id using only the table name?
BigQuery example:
String query =
"INSERT INTO `bigquery-public-data.samples.wikipedia` (title) VALUES ('random title');\n"
+ "SELECT title, language FROM `bigquery-public-data.samples.wikipedia` WHERE title = 'random title';";
// Create a BigQueryCatalog
// By default, it will use the BigQuery API with application-default credentials
// to fetch BigQuery resources.
BigQueryCatalog catalog = new BigQueryCatalog(/*bqProjectId=*/"bigquery-public-data");
// Add resources to the catalog
// After a resource is added, it will be available when ZetaSQL perform analysis
catalog.addTable("bigquery-public-data.samples.wikipedia");
// Configure the analyzer options using the BigQuery feature set
AnalyzerOptions options = new AnalyzerOptions();
options.setLanguageOptions(BigQueryLanguageOptions.get());
// Use the ZetaSQLToolkitAnalyzer to run the analyzer
// It results an iterator over the resulting AnalyzedStatements
ZetaSQLToolkitAnalyzer analyzer = new ZetaSQLToolkitAnalyzer(options);
Iterator<AnalyzedStatement> statementIterator = analyzer.analyzeStatements(query, catalog);
// Use the resulting AnalyzedStatements
statementIterator.forEachRemaining(analyzedStatement -> {
analyzedStatement.getResolvedStatement().ifPresent(System.out::println);
});
Resolved column lineage doesn't have project id and dataset id beginning with v0.5.0.
Try to execute this example code from README using v0.4.1 and v0.5.0 then compare both output, you'll notice that the project id and dataset id that are present in 0.4.1 is missing in 0.5.0.
My questions are:
BigQuery example:
Output for v0.4.1:
Output for v0.5.0: