Open dion-ricky opened 2 months ago
Confirmed automatic quoting name paths on this line: https://github.com/GoogleCloudPlatform/zetasql-toolkit/blob/e196efea5d8c70491dffd95b343e5c23bf9a4b13/zetasql-toolkit-core/src/main/java/com/google/zetasql/toolkit/ZetaSQLToolkitAnalyzer.java#L210
Tried to comment it but it ends up making the parser not recognizing the table name. I'm thinking of other workaround.
Tried this workaround by creating a statement rewriter and adding an alias to all table path expression that doesn't currently have alias. Since BQ automatically created alias by using the table name, we can do the same thing here.
I won't open a PR for this, because I believe the core package can do something better than rewriting the query. Therefore I will keep this issue open for now.
Here's the code that I use to add alias. You can call AddAliasForTableName.rewrite( {{ your query here }} )
before parsing your query.
import com.google.zetasql.Parser;
import com.google.zetasql.parser.ASTNodes;
import com.google.zetasql.parser.ParseTreeVisitor;
import com.google.zetasql.toolkit.ParseTreeUtils;
import com.google.zetasql.toolkit.StatementRewriter;
import com.google.zetasql.toolkit.catalog.bigquery.BigQueryReference;
import com.google.zetasql.toolkit.options.BigQueryLanguageOptions;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Objects;
import java.util.stream.Collectors;
public class AddAliasForTableName {
private static StatementRewriter.Rewrite createRewrite(int from, int to, String content) {
return new StatementRewriter.Rewrite(from, to, content);
}
private static StatementRewriter.Rewrite createRewrite(ASTNode node, String content) {
return createRewrite(
node.getParseLocationRange().start(), node.getParseLocationRange().end(), content);
}
private static String tableNameFromTablePath(ASTNodes.ASTTablePathExpression expression) {
final String[] tableName = {""};
expression.accept(new ParseTreeVisitor() {
@Override
public void visit(ASTNodes.ASTIdentifier node) {
tableName[0] = String.join(".", node.getIdString());
}
});
return tableName[0];
}
private static boolean shouldAddAlias(ASTNodes.ASTTablePathExpression expression) {
return Objects.isNull(expression.getAlias()) && !Objects.isNull(expression.getPathExpr());
}
private static boolean validTableName(ASTNodes.ASTTablePathExpression expression) {
String tableName = tableNameFromTablePath(expression);
return (BigQueryReference.isQualified(tableName)
&& !BigQueryReference.isWildcardReference(tableName));
}
private static List<StatementRewriter.Rewrite> addAlias(ASTNodes.ASTFromClause fromClause) {
ArrayList<StatementRewriter.Rewrite> rewrites = new ArrayList<>();
HashSet<ASTNodes.ASTTablePathExpression> expressionsToAlias = new HashSet<>();
fromClause.accept(
new ParseTreeVisitor() {
@Override
public void visit(ASTNodes.ASTTablePathExpression node) {
if (shouldAddAlias(node) && validTableName(node)) {
expressionsToAlias.add(node);
}
}
});
expressionsToAlias.forEach(
expr -> {
String tableNameStr = tableNameFromTablePath(expr);
String[] tblNameParts = tableNameStr.split("\\.");
rewrites.add(
createRewrite(
expr, "`" + tableNameStr + "`" + " AS " + tblNameParts[tblNameParts.length - 1]));
});
return rewrites;
}
private static List<StatementRewriter.Rewrite> rewriteSelect(ASTNodes.ASTSelect select) {
ASTNodes.ASTFromClause fromClause = select.getFromClause();
return addAlias(fromClause);
}
public static String rewrite(String query) {
ASTNodes.ASTScript parsedScript = Parser.parseScript(query, BigQueryLanguageOptions.get());
List<ASTNodes.ASTSelect> selects =
ParseTreeUtils.findDescendantSubtreesWithKind(parsedScript, ASTNodes.ASTSelect.class);
List<StatementRewriter.Rewrite> rewrites =
selects.stream()
.flatMap(select -> rewriteSelect(select).stream())
.collect(Collectors.toList());
return StatementRewriter.applyRewrites(query, rewrites);
}
}
When querying an unquoted table reference, BigQuery automatically provided alias for the table name in case insensitive manner. Example:
However, when parsing such query I got exception saying BLOCKS is not a recognized name:
This won't work on quoted table reference, for example BigQuery will refuse to execute this query:
So, I think this error is caused by the toolkit automatically adding quotes to all table reference. I'm not sure about this, haven't look into it in more detail.
Anyone available to check and fix is really appreciated. Thanks.