Devographics / Monorepo

Monorepo containing the State of JS apps
surveyform-sigma.vercel.app
Other
127 stars 52 forks source link

Handling privacy issues regarding cross-referencing #330

Open SachaG opened 11 months ago

SachaG commented 11 months ago

Problem

You know Alice lives in Madagascar and has taken the survey. You want to figure out their salary.

Since Alice is the only respondent from Madagascar, you can easily find out the salary by either filtering the salary chart to country = madagascar, or cross-referencing country vs salary.

Solution

We implement a dataset cutoff so that any result with less than n (by default, n=10) will be zeroed out and accompanied with an "insufficient data" message.

Conditions

We apply this cutoff when:

Note that if a filter is not active, we don't need to apply the cutoff to parent-level buckets even if a facet is active for properties such as count, percentageSurvey, etc. but we do need to zero out averageByFacet and percentilesByFacet since they can leak cross-referenced facet data.

Exceptions

We do not apply the cutoff for "raw" results with no filters or facets applied. In other words, it's ok to show that Madagascar only has 1 respondent as long as that info is not filtered or cross-referenced with any other.

Gotchas

We need to make sure there is no way to distinguish between empty brackets and zeroed out brackets. For example, it's not enough to replace the Madagascar bucket with an "insufficient data" bucket, we also need to create fake "insufficient data" buckets for the countries that are not present at all in the dataset. Otherwise it becomes possible to identify the real country by elimination.

Limitations

At the moment, if a query does not return any data at all it will return an empty array, instead of zeroed out buckets.