cloudspannerecosystem / memefish

memefish is the foundation to analyze Spanner SQL
https://cloudspannerecosystem.dev/memefish/
MIT License
76 stars 19 forks source link

Introduce `TypelessStructLiteralArg` and `NewConstructorArg` interfaces #183

Closed makenowjust closed 2 weeks ago

makenowjust commented 2 weeks ago

Close #119 Close #177

~TypelessStructValue (#177) and NewConstructorValue (#119) share their struct shapes. Then, this PR introduces a new struct named AsExpr for expression AS name-kind values and replaces them with it.~

~I believe this is not a hasty generalization because these struct shapes are exactly the same, and they are used for common semantics. However, I'm not sure AsExpr is the best name. If you have a better name, please tell me it!~

~Thank you.~


See https://github.com/cloudspannerecosystem/memefish/pull/183#issuecomment-2441417390.

apstndb commented 2 weeks ago

TypelessStructValue (#177) and NewConstructorValue (#119) share their struct shapes. Then, this PR introduces a new struct named AsExpr for expression AS name-kind values and replaces them with it.

I believe this is not a hasty generalization because these struct shapes are exactly the same, and they are used for common semantics.

I have concerned about there are undocumented differences in NEW constructors and typeless STRUCT literals.

https://github.com/google/zetasql/blob/194cd32b5d766d60e3ca442651d792c7fe54ea74/zetasql/parser/bison_parser.y#L8927-L8942

    | expression "AS" "(" path_expression ")"
      {
        // Do not parenthesize $4 because it is not really a parenthesized
        // path expression. The parentheses are just part of the syntax here.
        $$ = MAKE_NODE(ASTNewConstructorArg, @$, {$1, $4});
      }

I believe it is a syntax for proto extensions and it is not yet available in Cloud Spanner, but I think these semantics are not same. It may be a reason to distinguish between TypelessStructValue and NewConstructorValue.

spanner> SELECT NEW examples.shipping.`Order`("Japan" AS (shipping_address.country));
ERROR: spanner: code = "InvalidArgument", desc = "NEW constructor does not support proto extensions [at 1:50]\\nSELECT NEW examples.shipping.`Order`(\\\"Japan\\\" AS (shipping_address.country))\\n      
apstndb commented 2 weeks ago

This difference is natural because:

makenowjust commented 2 weeks ago

IMHO, when expression AS (path) is needed, we can introduce a new interface NewConstructorArg, and a new struct ExprAsPath, so we can go this way even if STRUCT and NEW syntax are different.

apstndb commented 2 weeks ago

Personally I think it is acceptable to generalize them temporarily, even if their semantics are not same, But I am wondering if this generalization really have benefit.

we can introduce a new interface NewConstructorArg, and a new struct ExprAsPath

I think it can't be a general name because it is expression AS (path_expression), not expression AS path_expression. It seems that (path_expression) is restricted form of generalized_extension_path. In ZetaSQL, new_constructor_arg is really special structure with no other similar syntax, because its syntax is specialized to Protocol Buffers.

apstndb commented 2 weeks ago

I understood. We can add another interface when it is needed. NewConstructorArg won't be concrete struct so it is not needed to define.

apstndb commented 2 weeks ago

However, I'm not sure AsExpr is the best name. If you have a better name, please tell me it! Rename AsExpr to ExprAsName, and use AsAlias

IMHO, ExprAsName doesn't imply the name is optional, and there are other expr AS name structure(e.g. in * REPLACE(expr AS name, ...)). I think it can be confusing so more explicit name, like ExprOptName, may be prefered.

syntax AS ident Rule in ZetaSQL ast node in ZetaSQL Parser func in memefish Name in memefish
AS ident required   opt_as_alias_with_required_as ASTAlias tryParseAsAlias() AsAlias
[AS] ident optional   opt_as_alias ASTAlias tryParseAsAlias() AsAlias
expr none none ASTExpression ExprArg or Expr  
expr AS ident required required (star_replace_item, select_column_expr_with_as_alias, ...) ASTExpressionWithAlias (not used externally?)  
expr [AS ident] required optional expression_with_opt_alias, (expression_using_argument, ...) ASTExpressionWithOptAlias   ExprAsName?
expr [AS] ident optional required      Alias
expr [[AS] ident] optional optional (select_column_expr, ...) concrete nodes    

(There are many variants.)

makenowjust commented 2 weeks ago

Finally, I conclude there is no better name for the struct expression [AS name]. That is, we should not introduce such a struct.

Therefore, we introduce new interfaces TypelessStructLiteralArg and NewConstructorArg. Currently, two interfaces have the same implementations and parsers, but they are separated for future extensibility.