Open kgyrtkirk opened 4 months ago
ideally we should first put a patch on this hole to explain the situation to the user; possibly by:
adding a check to DruidSqlValidtor
or something like that ; to ensure that when approx
is enabled only count
aggregates could have isDistinct
enabled; and describe to disable approx
@kgyrtkirk does you mean if there is DISTINCT inside COUNT and useApproximateCountDistinct = true,then throw a CalciteContextException in SqlValidatorImpl, and describe the message like:" we should disable approx" the in the CalciteContextException.
is that your expected result?
something like that; if we have useApproximateCountDistinct
only COUNT(DISTINCT x)
will work as expecteded other AGG(DISTINCT ...)
will produce the above misleading exception about not able to convert the plan...which will kinda leave the user without much clue :D
I think fix could be to rewrite the count(distinct)
to the sketch functions explicitly at compilation time and leave the distinct conversion rules enabled at all time
but giving a better error with a hint could give a chance to the users bumping into this to use the feature which is already present :)
@kgyrtkirk next time you can assign the task me, but this is good to learn as well.
@AlbericByte: sorry, I haven't seen a PR for it / further comments....next time I'll ask you about it.
right now that PR only restrict it for window functions
However it turned out that the situation is a bit more complicated than I was expecting when I've filed this issue - we can't just ban distinct usage for non-count(distinct)
cases - as there are aggregators like string_agg which could do the distinct
aggregation by themselves while translating to the native later...
We were talking about introducing some annotation markers to enable the compiler to verify this and restrict it that way.
But I'll reach out to you next time to avoid such a misunderstanding(s) - thank you for letting me know that you was not expecting the above!
useApproximateCountDistinct
supposed to enable a special mode to handleCOUNT(DISTINCT x)
with skethcesselect sum(distinct added) from wikipedia
will fail to plan ifuseApproximateCountDistinct
is enabledquidem test: