Open dandanlen opened 3 years ago
Nice find!
Each grouping expression generates noise layers, which should apply only to those buckets containing it. It seems the Cloak always applies these noise layers globally, to all the buckets, returning different noise values for a bucket which should not be affected by unrelated grouping expressions.
Not sure if this is a bug or a feature... either way I don't think it adds any privacy benefit since it can be worked around by decomposing the
group by grouping sets
query into multiplegroup by
statements.The following were run against https://demo.aircloak.com/data_sources/NAV_W1_TENANT_WS001.
☝️ This is equivalent to just
group by 2
and I get the same result whichever way I write the query statement. So far, so good. Link☝️ This is equivalent to the union of
group by 2
andgroup by 3
and I get the results I expect: The results contain the same two rows as in the query above (with differentgrouping_id
s). LinkNow if instead I add a column with the
bucket
functions something weird happens.☝️ The two rows for the "Quantity" grouping are still there, but the numbers are totally different. It looks like the mere fact of including another group has added extra noise to the unrelated column.
In other examples, this can even cause a column to be suppressed or removed entirely from the result set.
TLDR;
group by grouping sets
ought to be equivalent to the union of the individualgroup by ${subgroup}
queries. But in some cases adding an extra (and unrelated) grouping set can change results from other grouping sets.