Closed pdobacz closed 2 years ago
I don't understand why we need to collect all aggregators in bulk from a query tree. When (or if) we will need to support pre-processing subqueries, we will need to treat pre-processing, anonymizing, post-processing aggregators differently. For now, I think we should make
reference
always anonymize the bottom most subqueries and worry about pre-processing sensitive data later (once we fully support all the clause in a single query).
This might be an answer to the question asked in the PR description. I'm not sure we need this at all. The question is, should queries like those added in the tests, be valid or not, given current implementation of reference
?
I'm guessing, this particular validation pass we're discussing here, aims to protect the aggregation hooks from some pathological queries - are we OK with letting them through?
The question is, should queries like those added in the tests, be valid or not, given current implementation of reference?
I don't think they should be added yet, because reference
is not yet designed to handle these cases, so they would be testing whatever behavior (correct or not) currently exists.
After sleeping on it, I think, the added test cases are testing correct behavior being: "Post-aggregation hooks are being protected from queries containing multiple LCF aggregators. Instead of producing (possibly) nonsense results, they reject the query and produce a meaningful error". The behavior was incomplete (didn't descend to subqueries/joins), and the change fills this gap.
Is this reasoning correct?
The hooks also do a execution time check by calling AggregationContext.lowCountIndex
which allows only 0 or 1 matches.
The hooks also do a execution time check by calling
AggregationContext.lowCountIndex
which allows only 0 or 1 matches.
oh, so do we need this check validateSingleLowCount
here at all :) ?
We need it if we want to fail early during analysis. Otherwise we have to wait until all the buckets have been built during execution which can take a while.
ok since this seems to require more discussion, I'm splitting of the validateSingleLowCount
out to a separate branch and keeping only truly minor tweaks here
A loose collection of tweaks I gathered during past several days.
If the one about collecting aggregators is not worth it for any reason I'm missing, please let me know, we'll drop it