Aircloak / aircloak

This repository contains the Aircloak Air frontend as well as the code for our Cloak query and anonymization platform
2 stars 0 forks source link

Simple OR (without LED) #4222

Open yoid2000 opened 4 years ago

yoid2000 commented 4 years ago

It seems to me we can implement a safe simple OR with restrictions along the lines of what we have for simple CASE.

The basic allowed structure would be:

(col1 = val1 OR col2 = val2 OR ...) AND
(col3 = val3 OR col3 = val3 OR ...) AND
...

In other words, groups of OR'd conditions (OR groups) separated by AND

The restrictions would be as follows:

  1. Only simple expressions:
    • col = val
    • col BETWEEN val1 AND val2
    • col <> val
  2. No isolating columns.
  3. Only constants from shadow tables.
    • In the case of the range, we would require that the range encompass at least one shadow constant.
  4. A given simple expression cannot appear in every OR group.
    • Note that there is some danger that semantically identical expressions can be made with multiple of the above, for instance age = 10 and age BETWEEN 10 and 11 or gender = 'M' and gender <>F`. We would need to recognize this.
  5. The simple expressions must be clear (meaning no operations preceding the expression)

In the implementation, noise layers would be seeded from inspection --- no floating necessary. The range expression would have a single static noise layer, the others would have two noise layers (static and uid).

The reason for the fourth restriction is that we want to prevent expressions of the sort:

a OR (b AND c)

where a, b, etc. are the simple expressions.

The reason for this is that (b AND c AND ...) serves as pseudo-identifier even when the constants in b, c, etc. are shadow constants. If we allow an expression to exist in every OR group, then an attacker can generate the above expression with:

(a OR b) AND (a OR c)

Similarly x or (a and b and c) can be made with (x or a) and (x or b) and (x or c), and so on.

By contrast, if we prevent a given simple expression from being in every OR group, then I don't see how an attack can be formed. In other words, I don't see how an attacker can make two queries that have identical conditions except for the pseudo-identifier.

To prevent semantically identical conditions using BETWEEN (i.e. age = 10 and age BETWEEN 10 and 11), we can have a rule that if a BETWEEN expression, when applied to the shadow table, returns a single value, then we either reject the query or rewrite the BETWEEN as the equivalent equality.

To prevent semantically identical conditions using <> (i.e. gender = 'M' and gender <>F`), we can have a rule that if there are only two values in the shadow table, then the negand is re-written to the corresponding equality.

cristianberneanu commented 4 years ago

Similarly x or (a and b and c) can be made with (x or a) and (x or b) and (x or c), and so on.

If the conditions are normalized, why does this matter?

To prevent semantically identical conditions using BETWEEN (i.e. age = 10 and age BETWEEN 10 and 11), we can have a rule that if a BETWEEN expression, when applied to the shadow table, returns a single value, then we either reject the query or rewrite the BETWEEN as the equivalent equality.

This seems like a poor heuristic, especially with the low number of entries in the shadow table.

yoid2000 commented 4 years ago

If the conditions are normalized, why does this matter?

I'm not sure if I'm being clear. My point is that we want to disallow x or (a and b and c), so that means we need to disallow it in whatever form it appears in.

If what we do is normalize AND'd OR groups into OR'd AND groups, and then disallow any expression that results in an OR'd AND group, then that would be fine.

yoid2000 commented 4 years ago

This seems like a poor heuristic, especially with the low number of entries in the shadow table.

Perhaps so. We could work on it a bit if we choose to go this route.

At this point I'm very worried about LED performance (as well as the difficulty of implementing). @sebastian is talking about implementing it through emulation, which is going to kill performance and may end up being a non-starter. I'd really like to find a workable alternative...