Anonymization building blocks should return updated anonymizer

Aircloak / aircloak

This repository contains the Aircloak Air frontend as well as the code for our Cloak query and anonymization platform

2 stars 0 forks source link

Anonymization building blocks should return updated anonymizer #1256

Closed sebastian closed 7 years ago

sebastian commented 7 years ago

We have more complex anonymization steps which rely on other anonymization building blocks. Examples are avg (using sum and count), and stddev using avg.

The individual building blocks use and update the anonymization state, but the updated anonymizor isn't returned. As a result we are generating the same random values multiple times.

This fix should include a statement in the user changelog that results might differ from answers given to the same queries in the past.

cristianberneanu commented 7 years ago

I don't exactly understand what you mean here. The current behaviour is intentional. Where is the need for the returned anonymizer state and what value is returned multiple times?

sebastian commented 7 years ago

Take the avg implementation. It relies on the sum anonymizer followed by the count anonymizer. Each of these make us of the anonymization state and each of them update it. Since neither returns it, they both start with the same anonymization state.

In other words: the sum anonymizer needs to return the updated anonymization state which is then used as the input to the count anonymizer.

cristianberneanu commented 7 years ago

That is intentional. The analyst can compute the sum and count separately, so it makes sense the avg is the reported sum over the reported count. Furthermore, by chaining the state you leak more information as you now report the noise twice for the same bucket, reducing it's range.

sebastian commented 7 years ago

The analyst can compute the sum and count separately

Good point

Well, good points all around :)