Aircloak / aircloak

This repository contains the Aircloak Air frontend as well as the code for our Cloak query and anonymization platform
2 stars 0 forks source link

Should we support IS NOT DISTINCT FROM? #2621

Open sebastian opened 6 years ago

sebastian commented 6 years ago

The following query is written by Tableau:

SELECT CAST("loan"."emp_title" AS TEXT) AS "emp_title",
  CAST("loan"."grade" AS TEXT) AS "grade"
FROM "loan"
  INNER JOIN (
  SELECT CAST("loan"."emp_title" AS TEXT) AS "emp_title"
  FROM "loan"
  WHERE (SUBSTR(CAST(LOWER(CAST(CAST("loan"."emp_title" AS TEXT) AS TEXT)) AS TEXT), 1, LENGTH(CAST('911' AS TEXT))) = CAST('911' AS TEXT))
  GROUP BY 1
  HAVING (COUNT(CAST("loan"."emp_title" AS TEXT)) = 0)
) "t0" ON (CAST("loan"."emp_title" AS TEXT) IS NOT DISTINCT FROM "t0"."emp_title")
GROUP BY 1,
  2
obrok commented 6 years ago

It should be possible to either support natively for postgres/mysql or emulate in-DB with IS (NOT) NULL. In either case we don't pay any performance price if the operator is not used, so I think we should support it, given a tool exists that generates such queries.

obrok commented 6 years ago

A possible problem is that a IS NOT DISTINCT b is equivalent to a = b OR (a IS NULL AND b IS NULL) which is currently forbidden. Even if we judge this to be allowable from the privacy standpoint that means extra work to support it in all the places where we do something with equalities, like when generating noise layers.

sebastian commented 6 years ago

Low priority. Move away from current milestone.