Aircloak / aircloak

This repository contains the Aircloak Air frontend as well as the code for our Cloak query and anonymization platform
2 stars 0 forks source link

first derivative difference attack on negative conditions #1745

Closed yoid2000 closed 7 years ago

yoid2000 commented 7 years ago

Well, crap.

I found an attack that, after all, is going to require us to do probing and condition dropping.

I call it the first derivative difference attack because the attack exploits differences in differences.

So for instance say I do these two:

SELECT salary, count(*)
FROM employees
WHERE dept = 'CS' AND gender <> 'F'
GROUP BY salary
SELECT salary, count(*)
FROM employees
WHERE dept = 'CS'
GROUP BY salary

The lone woman in the CS dept is the victim, who is excluded from the first and may be included in some bucket in the second.

Now the idea was that the gender <> 'F' layer forces a potential difference between the two queries for any given bucket. This is true, but the problem is that the difference between each pair of queries is the same except for the one bucket where the woman. In other words, if the woman has a salary of 100K, then for every bucket other than 100K there may be a difference of 1 between the two queries, but for the 100K bucket, there may be a difference of 2.

I ran this attack, and for about 30% or so of the cases you get a definitive answer. For the other 70%, you aren't sure either because the true answer is in a low-counted bucket, or the noise just happens to cause no difference in the difference. However, you know when you aren't sure, so rather than make an incorrect guess you just make no guess at all.

The only solution I can think of is, again, to check for a low-count effect for each clause, and exclude the clause and its corresponding noise layer. In this way, the affect of the gender <> 'F' is nil for all buckets, equally.

I recommend we do this for negative conditions before the challenge. Frankly I could not find any cases of the attack for a positive condition, so even though I recognize the possibility, I don't see a need to implement it right away.

cristianberneanu commented 7 years ago

I don't get a few things: You say:

I found an attack that, after all, is going to require us to do probing and condition dropping.

Don't we do this already? What we don't do is probe each group in the output. In this case, the gender <> 'F' condition will pass the global probe and be included in the output.

So you mean that we need to do probing per each bucket?

yoid2000 commented 7 years ago

We do some probing, but we currently don't do any condition dropping.

From my understanding, we currently probe only when we have a "complex" negative condition. (A complex negative condition is one with math or something. Versus a simple condition which is just col <> val and nothing more.) The reason for the probing is cause we don't know what the semantics of the condition are, so we want to look at what the database produces to tell us that.

My understanding of how it works is: If the cloak gets this:

select count(*)
from table
where col1 = x and col2 + 1 <> 4

Then the cloak does something like this:

select uid, col2
from table
where col1 = x and col2 + 1 = 4
limit 10

and then observes the contents of col2 and use it to make a noise layer.

Please correct me to the extent that I misunderstand.

Anyway, what we need to additionally do is check if col2 is LCF and if it is, we want to effectively drop col2 + 1 <> 4 from the query, and as well drop the noise layer it would have had.

obrok commented 7 years ago

Anyway, what we need to additionally do is check if col2 is LCF and if it is, we want to effectively drop col2 + 1 <> 4 from the query, and as well drop the noise layer it would have had.

This is what we do currently, but as you said - for complex clauses only. Simple clauses like the one in the attack are considered to be "handled" by noise layers.

I think another difference between what we have now and what you're describing is that we don't produce a noise layer for the complex clauses when they are found "safe".

cristianberneanu commented 7 years ago

As @obrok said, we don't produce noise layers for the complex conditions (so col2 is not actually selected for the probe). We also don't do any grouping in the probe, we only do a global select.

It is not clear to me what does "check if col2 is LCF" means. The LCF is done per bucket, so that means we need to do grouping?

yoid2000 commented 7 years ago

This is what we do currently, but as you said - for complex clauses only. Simple clauses like the one in the attack are considered to be "handled" by noise layers.

Really? We do LCF on complex negative conditions and drop the condition if LCF shows low-count?

yoid2000 commented 7 years ago

We also don't do any grouping in the probe, we only do a global select.

This is good. It should be a global select, not per group. (Well, the latter would work too, but no reason to do it that way.)

cristianberneanu commented 7 years ago

Yes, we drop the complex condition if it fails the lcf test. But in this case, I don't think the condition would be dropped as there could be enough global IDs for it to pass.

yoid2000 commented 7 years ago

@cristianberneanu what do you mean by "in this case"?

If you mean this example:

SELECT salary, count(*)
FROM employees
WHERE dept = 'CS' AND gender <> 'F'
GROUP BY salary

There there is only one female, so the inverse condition (gencder = 'F') would indeed fail LCF.

Or do you mean something else?

yoid2000 commented 7 years ago

In any event, we need both noise layers and LCF. If the negative condition fails LCF, then the condition is dropped and no noise layer. If the negative condition passes LCF, then we need to make a noise layer for it.

obrok commented 7 years ago

In any event, we need both noise layers and LCF. If the negative condition fails LCF, then the condition is dropped and no noise layer. If the negative condition passes LCF, then we need to make a noise layer for it.

OK, what about complex conditions? For example a + b <> c - should we add "basic" noise layers for a, b, c, in this case? By a "basic" noise layer for x here I mean the same we would add if the condition was x = constant.

yoid2000 commented 7 years ago

What would the noise layer look like for a + b = c? At the moment I don't know how we do that.

obrok commented 7 years ago

It would be three basic noise layers for a, b, and c.

yoid2000 commented 7 years ago

So yes we should do what we would do for the corresponding equality (=) variant of the condition, but with the :<> atom added. If I understand it, that is how we currently deal with the simple negative conditions, yes?

cristianberneanu commented 7 years ago

There there is only one female, so the inverse condition (gencder = 'F') would indeed fail LCF.

My brain is rusty after the vacation, so I had trouble understanding some things :/.

I was thinking of the case in which there are enough females in the CS department for the condition to pass LCF, but there is only one female in the 100k salary range. But this is not what the attack presumes, right? The attack requires it to be only one female in the CS department, otherwise it wouldn't work, correct?

yoid2000 commented 7 years ago

The attack requires it to be only one female in the CS department, otherwise it wouldn't work, correct?

Yes, that is correct.

obrok commented 7 years ago

@yoid2000 a couple questions:

  1. For implementation reasons it would be easier to include the noise layer that would be generated for a <> thing when checking if a <> thing is LCF. Do you see any problems with this?
  2. Do we create the same layer for complex and simple clauses? That is we scrap the concept of including the right side of <> in the layer?
yoid2000 commented 7 years ago
  1. For implementation reasons it would be easier to include the noise layer that would be generated for a <> thing when checking if a <> thing is LCF. Do you see any problems with this?

I don't see a problem. Is this true, though, for both simple and complex clauses?

  1. Do we create the same layer for complex and simple clauses? That is we scrap the concept of including the right side of <> in the layer?

There must be something I don't understand. First, by way of terminology, I would consider col + 1 = val to be a complex clause. But in this case I would expect the noise layer to include a value (in this case, val - 1). I wasn't really aware that there are any cases where we don't include the right side of a comparator...

obrok commented 7 years ago

There must be something I don't understand. First, by way of terminology, I would consider col + 1 = val to be a complex clause. But in this case I would expect the noise layer to include a value (in this case, val - 1). I wasn't really aware that there are any cases where we don't include the right side of a comparator...

We never include the right side for positive clauses. In this case the base of the noise layer would be just be the name of the column. We do include it for simple negative clauses, to guard against the averaging attack of submitting many bogus values as the right side of <>. I thought we could not include it once we're checking LCF for all negative clauses.

yoid2000 commented 7 years ago

Ah, ok, but for positive clauses, we include the value from the floated column, which (for simple clauses anyway) is equivalent to including the right side.

So what I'm supposing is that, when you transform the <> into = for the purpose of the probe query, then you also float the column and use the returned value in lieu of the right side of the clause. Does this make sense? If not maybe we could do by voice or chat.