dotnet run --project src/OpenDiffix.CLI/ -- --aid-columns customers.id --low-threshold 2 --low-mean-gap 0 --layer-noise-sd 0 --low-layer-sd 0 --outlier-count 1 1 --top-count 6 6 -f data/data.sqlite -q "SELECT company_name, count(*), count(distinct id) as count_distinct FROM customers GROUP BY 1 ORDER BY 1 ;"
"company_name","count","count_distinct"
"Alpha Centauri Inc.",60,60
"Beta Centauri Ltd",98,98
"Gamma Centauri GmbH",42,42
"Outlier Inc.",2,2
pg_diffix
prepare matching anon params (user must have anonymizing access, but be SUPERUSER)
UPDATE pg_settings SET setting = 0 WHERE name = 'pg_diffix.noise_sigma';
UPDATE pg_settings SET setting = 1 WHERE name = 'pg_diffix.outlier_count_min'; UPDATE pg_settings SET setting = 1 WHERE name = 'pg_diffix.outlier_count_max'; UPDATE pg_settings SET setting = 6 WHERE name = 'pg_diffix.top_count_min'; UPDATE pg_settings SET setting = 6 WHERE name = 'pg_diffix.top_count_max';
query:
SELECT company_name, count(*), count(distinct id) as count_distinct FROM customers GROUP BY 1 ORDER BY 1 ;
company_name | count | count_distinct
---------------------+-------+----------------
Alpha Centauri Inc. | 60 | 60
Beta Centauri Ltd | 98 | 98
Gamma Centauri GmbH | 42 | 42
Outlier Inc. | |
(4 rows)
I think that this is the issue which initially caused the prop tests from https://github.com/diffix/reference/pull/302 to fail.
Example query which diverges:
reference
pg_diffix
prepare matching anon params (user must have anonymizing access, but be SUPERUSER)
query:
(NOTE the difference in the last row 2 vs Null)
The pg_diffix used uses a workaround for #134, contained in this
piotr/spike-property-tests
branch(NOTE 2 the top_count and outlier_count param values cause the flattening to be noiseless)