Potential NULL counting attack

obrok commented 5 years ago

While thinking about the implementation of the safe_* functions that will return NULL instead of overflowing, I started thinking about the following attack sketch:

SELECT COUNT(*), (winpoints + 9999999999999999999) % 1
FROM games
GROUP BY 2

Note that the noise in this function doesn't depend on the constant 9999999999999999999. Note also that this is currently disallowed on a technicality (constants above 10^18 are disallowed), however, I'm assuming there is some way to work around that, perhaps by simply using * instead of +, depending on the data. Now let's assume we have the safe_mul function implemented, so that the raw data from the database will have 0 if winpoints + large_constant is below the overflow bound and NULL otherwise. It seems to me that one can run this query, varying the large constant by 1 and get counts for all ranges of winpoints of the form {N, infinity}, which should be prohibited by the range restrictions by looking at the proportion of NULL to 0 in the result.

@yoid2000 WDYT?

yoid2000 commented 5 years ago

Hi @obrok . I'm missing something. Can you be more specific about how the attack works? (Among other things, I don't know what where the N of {N, infinity} comes from...)

obrok commented 5 years ago

Well, let's say that 99 is the maximum int for simplicity. Then you issue this query:

SELECT COUNT(*), (winpoints + 50) % 1
FROM games
GROUP BY 2

You will get two rows, maybe like so:

 count | %
-------+------
  1000 | 0
  1000 | NULL

Next you issue this query:

SELECT COUNT(*), (winpoints + 51) % 1
FROM games
GROUP BY 2

And you get something like this:

 count | %
-------+------
   970 | 0
  1033 | NULL

I think the numbers won't have to add up to the same, because of noise. However, you have obtained the noisy count of winpoints BETWEEN 0 AND 49 and the noisy count of winpoints BETWEEN 0 AND 48 this way (the first row in both cases), which should be forbidden.

obrok commented 5 years ago

@yoid2000 I made a more concrete version of the attack against the actual implementation we have. Ignoring a bug I found, I'm still pretty convinced that one can do what I claimed.

We look at this query (currently allowed by the cloak):

SELECT COUNT(*), (value * 99999999999999999 + 999999999999999999 + 923372036854775881) % 1
FROM integers
GROUP BY 2

 count  | %
--------+---
    751 | 0
 108243 |
(2 rows)

Maximum long integer is 2 ^ 63 - 1 = 9223372036854775807. So the count in the first row is effectively COUNT(*) WHERE value <= 73, because 73 * 99999999999999999 + 999999999999999999 + 923372036854775881 = 9223372036854775807 = 2 ^ 63 - 1. The count in the second row is COUNT(*) WHERE value > 73. Both of these queries should be disallowed.

To compare, we can look at this query (note the last additive constant is one bigger):

SELECT COUNT(*), (value * 99999999999999999 + 999999999999999999 + 923372036854775882) % 1
FROM integers
GROUP BY 2

 count  | %
--------+---
    701 | 0
 115687 |
(2 rows)

In this case we have COUNT(*) WHERE value < 73 and COUNT(*) WHERE value >= 73, because 73 * 99999999999999999 + 999999999999999999 + 923372036854775882 = 9223372036854775808 = 2 ^ 63, which is equivalent to COUNT(*) WHERE value <= 72 and COUNT(*) WHERE value > 72, because the underlying type is integer.

I'm kinda worried that this invalidates the whole approach we just spent upwards of a month on.

yoid2000 commented 5 years ago

@obrok what is the reason for having sums of large numbers instead of a single larger number? I gather that the larger number would be rejected somewhere in the pipeline? (Either by the cloak or by the DB?)

obrok commented 5 years ago

@yoid2000 Yes, there's a limit of 10^18 on numeric constants, including on constant expressions, so this is how the attack has to be built up at the moment. Note that:

column + 123 + 123

is equivalent to

(column + 123) + 123

and not to

column + (123 + 123)

so it's not treated as a constant expression (123 + 123), and therefore allowed even if the sum of the constants is over the 10^18 limit.

yoid2000 commented 5 years ago

I'm missing something. Is there some circumstance where

(column + 123) + 123

and

column + (123 + 123)

return different values?

obrok commented 5 years ago

No, math should work as expected :). However:

SELECT (value + 999999999999999999) + 999999999999999999 FROM integers;

is allowed by the cloak, while

SELECT value + (999999999999999999 + 999999999999999999) FROM integers;

gives

ERROR:  Constant expression is out of valid range: numeric values have to be inside the interval [-10^18, 10^18].

The error was detected at line 1, column 36:

    1:    SELECT value + (999999999999999999 + 999999999999999999) FROM integers;

yoid2000 commented 5 years ago

Is that an inconsistency in the cloak that should be fixed? I'm still missing the importance of why you mention the difference in execution order....

obrok commented 5 years ago

Is that an inconsistency in the cloak that should be fixed? I'm still missing the importance of why you mention the difference in execution order....

It's just a sidenote, ignore it. It's in case you encountered this case while playing with the attack.

cristianberneanu commented 5 years ago

This looks to me like a somewhat different issue and not one invalidating the overflow protection changes. If I understand correctly, the problem here is that using this trick, one can bypass the range restrictions, right? Basically, you can issue queries that separate data into unbounded and unsnapped ranges.

I think there are similar tricks that can happen with other math functions, like sqrt for example:

SELECT sqrt(column - 5), count(*) FROM table GROUP BY 1

All the rows with NULL values in column or with values less than 5 will be grouped in the same bucket.

We already have some restrictions for grouping by implied ranges. Maybe we should review them and modify them so these queries are not allowed anymore? For example, we can forbid grouping by any value that can overflow or underflow. Or we can snap the constants used in a grouping expression. Or restrict the amount of math possible in grouping expressions.

yoid2000 commented 5 years ago

@cristianberneanu I understood @obrok 's attack to be general to the case where safe functions return NULL, while I understand what you just wrote to be specific for columns that have NULL values. If so, then these are two different things and I'm not sure we should worry about the latter...

cristianberneanu commented 5 years ago

No, they are identical. sqrt(x) returns NULL when x is NULL or negative. (x+ 9999999999999999999) % 1 returns NULL when x is NULL or the operation overflows.

yoid2000 commented 5 years ago

@cristianberneanu ok thanks for the explanation.

Hmmmmm .... let me think about it .... sheesh, this never ends :(

yoid2000 commented 5 years ago

So it seems to me that a safe function can be categorized as a discontinuous function. If we simply classify it that way, and then apply the same restrictions to safe functions that we apply to discontinuous functions, does that take care of it?

cristianberneanu commented 5 years ago

So it seems to me that a safe function can be categorized as a discontinuous function. If we simply classify it that way, and then apply the same restrictions to safe functions that we apply to discontinuous functions, does that take care of it?

I think @obrok can properly answer this, as he has the most experience with the math restrictions, but it seems reasonable to me, since a discontinuity in the output of the expression is what causes the problem.

obrok commented 5 years ago

So it seems to me that a safe function can be categorized as a discontinuous function. If we simply classify it that way, and then apply the same restrictions to safe functions that we apply to discontinuous functions, does that take care of it?

Did a quick check of this, doesn't seem to help (with the restrictions as-is, just adding the classification). Details:

We don't have a discontinuous function category, but we have a restricted category, functions like %, round, trunc, etc. are in this category.
I added the restricted category to all the math operators.
From what I can tell, the only restriction on the restricted category that would matter here is that an expression can have at most 5 restricted operations in it.
The attack uses 4 operations, so the restriction is not triggered
Even if we lowered this restriction to 2 operation, I think a version of the attack would work for at least some data, because the attack basically does (column + constant) % 1. If column is big enough so that (column * constant) % 1 or (column ^ constant) % 1 works here, then at least some bad ranges can be extracted. Furthermore, it feels to me like the cap on constant size of 10 ^ 18 is a bit arbitrary, and something we would like to remove at some point, so I can see us accidentally enabling even the full (column + constant) % 1 in the future.

yoid2000 commented 5 years ago

Ok. Any thoughts on how to address this?

obrok commented 5 years ago

Any thoughts on how to address this?

None yet, but allow me to insert a broader rant into this discussion, because I think it is relevant. Note that the rant is not angry, I'm just wrestling with what a solution might look like and this is where my thoughts go.

\

Over and over we bump into the problem that we basically want to give the user a "pixelated" or "snapped" view of the data, so that exact values cannot be observed, but if there is enough distinct users in a bucket, then that bucket can be observed. However, we're continuously trying to achieve this "from the top". What I mean by that is that all the operations in our pipeline (like math, let's say) act on the raw data, but we try to impose limits on how the pipeline is constructed so that somehow the result is appropriately bucketized. The opposite approach would be "bottom up" - bucketize the raw data first, and only then apply any operations, so that the results are guaranteed to be bucketized at every stage.

I realize I've already raised this. I forgot the name of the anonymization technique that basically only does this, but I remember you pointed out that it's out there, it's bad, and it's not what we're doing. Nevertheless, it seems to me that this is the root of the problem we're facing in this issue and a couple similar ones, and unless we directly address that, there will always be some hole, we just won't always be clever enough to find it.

\

yoid2000 commented 5 years ago

First regarding the rant, you are of course correct: if we could figure out how to produce bucketized raw data first, then I guess we could just publish the bucketized data and the analyst could do whatever they want with it. But I don't know how to even approach this.

Regarding this issue, I assume that we need to do something along the lines of what @cristianberneanu suggested: we identify when safe functions are required, and then put additional restrictions to prevent their use in attacks.

My general understanding was that the use of safe functions is relatively rare, at least for the overflow case? In other words, we only use safe functions when we think there is some danger of overflow?

As for sqrt, wasn't it the case before that we simply disallowed any math within the sqrt? Something like that?

obrok commented 5 years ago

As for sqrt, wasn't it the case before that we simply disallowed any math within the sqrt? Something like that?

Doesn't seem we have any such restriction anymore. For example this is allowed by the cloak (but currently crashes due to a bug in overflow protection):

SELECT COUNT(*), sqrt(age * 99999999999999999 * 2 % 1) + 1
FROM demographics
GROUP BY 2

yoid2000 commented 5 years ago

Which means, so far as I know, that we used the new safe feature to give us more utility on sqrt, but that it's backfired and we need to roll back to the earlier restrictions.

obrok commented 5 years ago

Ok, so we used to have a "potentially crashing" category for /, %, and sqrt, where no math would be allowed in the arguments to such operations (just the second argument in the case of / and %). However, if this restriction were to be applied to all the mathematical operators, then it would basically mean a limit of 1 mathematical operator per expression. Something like column * 2 + 1 would be rejected.

I guess we could apply this restriction only if the analysis determines the expression is unsafe. Although "only" is a big word here, difficult to say how often that would happen in practical applications.

In any case I don't think it makes sense to go back to the restriction on sqrt et al. unless we entirely scrap the bound analysis mechanism. My reasoning is that if bound analysis (with some extension we make) cannot prevent this case, then it's useless.

yoid2000 commented 5 years ago

My understanding of how things proceeded here is as follows:

We identified the overflow and timing side-channel attacks.
For overflow, we decided to make safe variants.
We then decided that, as long as we were making safe variants, we could do the same for sqrt, /, and %, and relax some restrictions.

This third step, however, is strictly speaking unnecessary for the purpose of protecting against overflow. We could just as well have kept with the "potentially crashing" restrictions. This doesn't mean that the bound analysis is "useless" per se, because it still can be used for certain types of overflow attacks (those that use multiplication or exponents etc.)

Anyway, now we find that the safe variants can be attacked. So we have the following choices:

Go back to the "crashing" restrictions.
Do crashing restrictions only when analysis shows unsafe. (which someday could involve improving the analysis to minimize such cases)
Design a new set of restrictions specifically for the safe-function attacks.

You guys know better than me, but my guess/hope would be that doing number 2 would not be too hard, since we already have the crashing restrictions code and we just need to apply it selectively. But maybe testing becomes a bitch or something...I don't know.

Yapee you must have some intuition as to how frequently the restrictions would have to be applied, even understanding that that intuition could be way off.

In any event, at the moment I'd much rather do 1 than 3, because 1 simply restores the old status quo, and 3 will take time and more effort when we really need to move on to the Telefonica-requested features. I'd rather do 2 than 1 assuming that applying the restrictions would be relatively rare. The main downside I see is that the restrictions become even more confusing because sometimes they apply and sometimes they don't...

Thoughts?

obrok commented 5 years ago

Thoughts?

The overall approach of going back to the restrictions and relaxing them when bound analysis indicates it's OK seems reasonable enough...

However, it doesn't seem the restrictions as they were before they got removed prevent the particular attack. This is because the expressions used is of the form (a * b + c) % 1, while the restriction only rejects expressions where the divisor contains both a constant and a column. We would probably need to tighten the restriction to disallow any math in both the arguments to / and %, but that seems very oppressive.

Perhaps the problem is that both integer division and modulo are in fact implicit ranges, because they create buckets in a non-continuous manner?

Maybe the rule should be that no discontinuous operations can be nested in other discontinuous operations and the safe variants are discontinuous? So a % b % c would always be rejected, a * b / c would be rejected if the / divides integers and the * can overflow, etc.

cristianberneanu commented 5 years ago

Perhaps the problem is that both integer division and modulo are in fact implicit ranges, because they create buckets in a non-continuous manner?

My understanding is somewhat similar to this as well: at the core of the issue is the bypass of range restrictions (since groups and ranges are conceptually similar) using discontinuous functions, something against which we already have partial protection, since we don't allow functions like trunc or floor neither in filters, nor in groups. Until now, we ignored the fact that math operators were discontinuous at the boundaries of the input/output space. Before, the discontinuity manifested in a crash, now it manifest as a NULL output, since the result is either not defined or out of range.

Maybe the rule should be that no discontinuous operations can be nested in other discontinuous operations and the safe variants are discontinuous? So a % b % c would always be rejected, a b / c would be rejected if the / divides integers and the can overflow, etc.

It looks to me the attack can be implemented with a single discontinuous function (i.e. sqrt(column - constant)). Also, not all inputs can result in discontinuities: 1/x or 1%x is discontinuous when x <= 0, while x/10 and x%10 are never discontinuous.

Some potential approaches:

We could allow the query, but then censor the buckets that contain discontinuities: for example, in the case of sqrt(x) the NULL bucket would always be censored.
We could reject expressions with discontinuities in filters and groups, unless we can guarantee the input range leads to continuous output. We could request additional filters to be added or we can add the extra filters automatically.

yoid2000 commented 5 years ago

HI, back from holiday.

We could allow the query, but then censor the buckets that contain discontinuities: for example, in the case of sqrt(x) the NULL bucket would always be censored.

This is a really interesting idea. The way safe functions are currently implemented, can we distinguish between an output that is NULL because the underlying data is NULL, versus because the function triggered the safety output?

cristianberneanu commented 5 years ago

This is a really interesting idea. The way safe functions are currently implemented, can we distinguish between an output that is NULL because the underlying data is NULL, versus because the function triggered the safety output?

No, we can't. The only special value in a database column is NULL, we don't have different placeholders for exception raised, etc.

yoid2000 commented 5 years ago

Ok, but presumably we would filter NULL output only when safe functions are used, and not filter otherwise?

cristianberneanu commented 5 years ago

Yes, either we censor the NULL bucket if a safe function is aggregated over, or we need to add extra filters that exclude data that might raise exceptions. I don't have the details figured out.

obrok commented 5 years ago

We could allow the query, but then censor the buckets that contain discontinuities: for example, in the case of sqrt(x) the NULL bucket would always be censored.

I don't think this works against the original attack described. Notice that both buckets in the attack description are problematic - the one with NULL and the one with 0.

Aircloak / aircloak

Potential NULL counting attack #3789