Closed plaflamme closed 3 days ago
Yeah, this looks like a bug. Thanks for reporting.
FYI @VaggelisD
So the fix I proposed in that PR wasn't really complete. The crux of the problem is that ::
is added only for expressions where it can't mess up precedence. It just so happens that JSON_VALUE
is parsed into JSONExtractScalar
in bigquery, which is both a binary expression and a function, and binaries are unsafe to add ::
to because precedence can be changed, so this one's an exception where we fallback to the safer CAST
syntax.
Fixing this generically (i.e. taking other similar expressions into account) is a PITA and after discussing this internally we concluded that we won't address this right now.
@georgesittas that's unfortunate.
FMI: what cases is JSON_VALUE
considered a binary expression? From the BigQuery documentation, it's a function that always returns STRING
.
It's a binary expression in dialects like duckdb, postgres, mysql etc: x -> some_json_path
. This syntax and the function one are represented by the same expression to facilitate transpilation.
This syntax and the function one are represented by the same expression to facilitate transpilation.
I see.
Fixing this generically (i.e. taking other similar expressions into account) is a PITA and after discussing this internally we concluded that we won't address this right now.
Understood. Should I report an issue to sqlglot about this?
I think it's not a sqlglot issue, it's only a sqlmesh one. For now we'll probably keep the current behavior, cc @tobymao.
I think it's not a sqlglot issue, it's only a sqlmesh one
Right, though presumably, some support from sqlglot would be necessary? For example, nodes that are both Binary
and Func
(such as JSONExtractScalar
) could expose what they transpile to in different dialects, e.g.:
class BinaryOrFunc(Binary, Func):
"""Nodes that either transpile to binary expressions or functions depending on the dialect"""
def is_func(self, dialect) -> bool:
"""True when this will always transpile to a function call in the specified dialect"""
pass
Perhaps this isn't how you'd want to solve this though, not sure...
There could be some new abstraction there to help, sure. But it wouldn't really be useful for SQLGlot, and so there's a risk of it being refactored/removed in the future without a clear indication of its impact. So ideally, since the formatter lives in SQLMesh land, a possible solution should be self-contained without creating such implicit dependencies between the two packages. And currently, a solution in SQLMesh would kinda be a PITA to maintain. Possible solutions we discussed involved hardcoding dialects (e.g. a mapping of dialect to "castable expressions"), which is also a smell, since this mapping can become out-of-date.
So the tradeoff here was that: for these rare cases, it should be ok to have a small discrepancy in how the formatter behaves. This is already true today, e.g. binary and unary expressions aren't cast using the ::
syntax.
Yeah, that makes sense, I appreciate not wanting to bloat sqlglot for sqlmesh purposes. That said, since sqlglot is a transpiler, it does seem like it could be useful to know what an ambiguous expression will transpile to. Maybe there's only one use-case (formatting SQL), but perhaps having an open issue on sqlglot repo would surface other use cases?
We generally avoid keeping open-ended open issues in sqlglot, because we use the issue list to as an indication of what's actively being developed. If you want you can open an issue, cross-reference this SQLMesh issue and then we can close it as not planned, so that it's documented.
Bumping from
0.129.0
to0.133.0
causes the following diffs in our models:rewritten to:
Can we avoid this? We much prefer the shorthand casts than the normal BigQuery syntax.