Closed sebastian closed 7 years ago
I don't exactly understand what you mean here. The current behaviour is intentional. Where is the need for the returned anonymizer state and what value is returned multiple times?
Take the avg implementation. It relies on the sum
anonymizer followed by the count
anonymizer. Each of these make us of the anonymization state and each of them update it. Since neither returns it, they both start with the same anonymization state.
In other words: the sum
anonymizer needs to return the updated anonymization state which is then used as the input to the count
anonymizer.
That is intentional. The analyst can compute the sum and count separately, so it makes sense the avg is the reported sum over the reported count. Furthermore, by chaining the state you leak more information as you now report the noise twice for the same bucket, reducing it's range.
The analyst can compute the sum and count separately
Good point
Well, good points all around :)
We have more complex anonymization steps which rely on other anonymization building blocks. Examples are
avg
(usingsum
andcount
), andstddev
usingavg
.The individual building blocks use and update the anonymization state, but the updated anonymizor isn't returned. As a result we are generating the same random values multiple times.
This fix should include a statement in the user changelog that results might differ from answers given to the same queries in the past.