Closed smoothml closed 5 months ago
Ah, that's annoying. Thanks for looking into it @DaveDeepl. I wonder whether we could catch this specific exception and ignore it. I'll raise an issue in the sqlglot
library about it.
I have raised an issue with sqlglot
: tobymao/sqlglot#3188
When parsing the query with sqlglot
, the joins
object has a kind=ARRAY
field. For example, for the query in my original when parsed with sqlglot.parse_one
you get
Select(
expressions=[
Alias(
this=Sum(
this=Literal(this=1, is_string=False)),
alias=Identifier(this=impressions, quoted=False)),
Column(
this=Identifier(this=city, quoted=False)),
Column(
this=Identifier(this=browser, quoted=False))],
from=From(
this=Subquery(
this=Select(
expressions=[
Alias(
this=Array(
expressions=[
Literal(this=Istanbul, is_string=True),
Literal(this=Berlin, is_string=True),
Literal(this=Bobruisk, is_string=True)]),
alias=Identifier(this=cities, quoted=False)),
Alias(
this=Array(
expressions=[
Literal(this=Firefox, is_string=True),
Literal(this=Chrome, is_string=True),
Literal(this=Chrome, is_string=True)]),
alias=Identifier(this=browsers, quoted=False))]))),
joins=[
Join(
this=Table(
this=Identifier(this=cities, quoted=False),
alias=TableAlias(
this=Identifier(this=city, quoted=False))),
kind=ARRAY), # <-- THIS LINE
Join(
this=Table(
this=Identifier(this=browsers, quoted=False),
alias=TableAlias(
this=Identifier(this=browser, quoted=False))))],
group=Group(
expressions=[
Column(
this=Identifier(this=city, quoted=False)),
Column(
this=Identifier(this=browser, quoted=False))]))
To fix this bug I think we just need to exclude Join
objects of this type in the helpers.validate_all_input_mocks_for_query_provided
function.
I haven't had time to check yet, but it could be that we just need to make the helpers.get_source_tables
function more robust:
Right now it uses sqlglot's root.traverse()
. There we loop over the sources. We could try to check for that source not to be of kind array. Maybe that already solves the problem.
I'm not sure that assumption holds as it's certainly possible to combine joins:
SELECT
sum(1) AS impressions,
city,
browser
FROM
(
SELECT
['Istanbul', 'Berlin', 'Bobruisk'] AS cities,
['Firefox', 'Chrome', 'Chrome'] AS browsers
)
ARRAY JOIN
cities AS city,
browsers AS browser
JOIN (SELECT 'Istanbul' AS city, 'foo' AS bar) cte USING city
GROUP BY
city,
browser
SETTINGS joined_subquery_requires_alias=0
Note, in this example trying to add the column cte.bar
to the output resulted in an error even though the result is filtered correctly, so perhaps this is just a bad thing to do in Clickhouse. I'm also not sure I've actually seen this done. I think there are two options:
ARRAY JOIN
in isolation.I would be in favour of option 1. If it becomes a problem we can always change it.
As an aside, it seems you can now achieve the same result using Tuple
as described at the bottom of this page.
Consider the following example:
When testing this,
sql-mock
will fail the test with the following error:The arguments to the
ARRAY JOIN
operation should not need to be mocked as they are columns in the sub-query.