Attack that exploits DB crash due to out-of-range overflow

obrok commented 5 years ago

The following query crashes (on the default dev database - data dependent):

SELECT AVG(2 ^ (64 * accounts.customer_id)) FROM accounts

Error:

ERROR 22003 (numeric_value_out_of_range): value out of range: overflow

I think this is a potential attack vector in the same vein as division by 0 in the database is - an attacker could deduce if a specific value matches some condition or not by setting the 64 value above so that it only overflows when computed for that specific customer_id (or a larger one, but perhaps those can be excluded with some other condition).

Ping @yoid2000

yoid2000 commented 5 years ago

I'm not immediately seeing the attack. Can you give an example? (I hope there isn't one, because it isn't so clear to me how to prevent overflows...)

obrok commented 5 years ago

Let's say the attacker knows that the victim is the oldest out of the patients in Cracow at 60 years old. Let's further assume that (2 ^ (64 * 60)) causes the overflow while (2 ^ (64 * 59)) does not. Then the attacker just compares:

SELECT AVG(2 ^ (64 * age)) FROM table WHERE city = 'Cracow' AND disease = 'cancer'
SELECT AVG(2 ^ (64 * age)) FROM table WHERE city = 'Cracow'

If both queries crash, then the attacker knows that the victim has cancer.

yoid2000 commented 5 years ago

I see. So the attacker selects values of X and Y for (X^(Y*val)) to get the overflow at just the right place.

Ok, seems like a real attack to me....good catch

Any solutions come to mind? One halfbaked and complex-sounding idea would be to

set a noisy threshold NT somewhere between the overflow value and say 0.9 * overflow, then
compute the minimum column value Cmin that would cause the threshold to be reached, then
set a condition WHERE column < Cmin, then
do low-effect detection on the condition to see if anything exceeds it, and if so, reject the query

obrok commented 5 years ago

set a noisy threshold NT somewhere between the overflow value and say 0.9 * overflow, then

You mean for every expression? It seems one would need to do this whole thing for every subexpression to be sure, for example for (X ^ (Y * val)) do one check for (Y * val) and another for (X ^ (Y * val)).

yoid2000 commented 5 years ago

You mean for every expression? It seems one would need to do this whole thing for every subexpression to be sure, for example for (X ^ (Y val)) do one check for (Y val) and another for (X ^ (Y * val)).

Sheesh. Ok.

I don't have any good ideas :(

yoid2000 commented 5 years ago

Man math kills us in so many ways.

Here is another thought. For each column, we store two numbers, one a bit higher than the true max, and one a bit lower than the true min (or in both cases the same as the max/min if there are > LCF instances). Then with any math expression, we first run the expression on the DB.

So, if the attacker has AVG(2 ^ (64 * age)) in the expression, and the highest age in the DB is 105, and so our test_max=109, then we would try:

SELECT AVG(2 ^ (64 * 109)) on the DB. (and a similar thing for the min)

If there is no overflow, then we regard the thing as safe and run the query. If there is an overflow, we supply the appropriate error message.

obrok commented 5 years ago

Man math kills us in so many ways.

Well, math is Turing-complete, so that's hardly surprising.

Here is another thought...

Makes sense, although I bet this can be circumvented by a clever expression that maxes out for non-max values of age. It could be tightened somewhat by storing 100 or so values between noisy_max and noisy_mean. I guess our expression limit will prevent too complicated expressions, but I'm not at all confident it's impossible to construct an expression of 5 mathematical operators that has a very sharp peak at a specified point.

yoid2000 commented 5 years ago

Ok I changed the title to make it sound more scary.

cristianberneanu commented 5 years ago

This is a problem we encountered in various forms and discussed about before, but with no elegant solution in sight.

Here is another thought. For each column, we store two numbers, one a bit higher than the true max, and one a bit lower than the true min (or in both cases the same as the max/min if there are > LCF instances). Then with any math expression, we first run the expression on the DB.

Doesn't look too sound to me either. An expression like 1/(col - X) maximizes around X.

It seems to me that the root of the issue is that some offloaded math operations raise an exception on error instead of returning NULL, like all our cloak-emulated functions do.

We could try to check the input in the offloaded SQL code. An expression like 1/x would become 1/NULLIF(x==0, x).

Another option would be to implement our own math operators inside the database, using the backend's custom language for doing that. If we go that route, at some point we can also start implementing various anonymizer steps in it, offloading more work to the database at the cost of increased complexity in our integration code.

yoid2000 commented 5 years ago

Is it currently the case that we don't check for divide-by-zero? I had supposed that we do.

cristianberneanu commented 5 years ago

We only check in the cloak, offloaded expressions are executed in the database.

yoid2000 commented 5 years ago

ok, but I see that the cloak checks for expressions that could cause divide-by-zero and reject such queries before accessing the database?

yoid2000 commented 5 years ago

Here is another thought. For each column, we store two numbers, one a bit higher than the true max, and one a bit lower than the true min (or in both cases the same as the max/min if there are > LCF instances). Then with any math expression, we first run the expression on the DB.

Doesn't look too sound to me either. An expression like 1/(col - X) maximizes around X.

@cristianberneanu it seems that the cloak doesn't allow an expression like 1/(col-X) in the first place. So an expression that peaks at some value in the middle would have to be created some other way. Maybe @obrok is right that there is some way to do it (with 5 operators) but it isn't obvious what that is.

Which suggests to me that the idea of trying the math expression in the DB on max and min values is worth doing. That eliminates at least the low-hanging fruit of attacking a user with a high value.

cristianberneanu commented 5 years ago

@cristianberneanu it seems that the cloak doesn't allow an expression like 1/(col-X) in the first place

But it does allow 1/col which peaks around 0.

yoid2000 commented 5 years ago

But it does allow 1/col which peaks around 0.

I guess we couldn't see how that expression could be utilized in a divide-by-zero attack.

cristianberneanu commented 5 years ago

I guess we couldn't see how that expression could be utilized in a divide-by-zero attack.

But isn't it better to implement the protection using NULLIF clauses for operations like / or sqrt? That way, we can also eliminate the current restrictions that disallow any kind of math on the input columns for these functions.

yoid2000 commented 5 years ago

But isn't it better to implement the protection using NULLIF clauses for operations like / or sqrt? That way, we can also eliminate the current restrictions that disallow any kind of math on the input columns for these functions.

Maybe, but I'm concerned now that if we allow more freedom in division, then someone can more easily build an overflow attack by allowing the division to get quite close to zero, but not exactly zero, so the NULLIF is not triggered...

cristianberneanu commented 5 years ago

The NULLIF condition doesn't have to be value = 0. It can also be something like value <= EPSILON AND value >= -EPSILON, which should protect against both overflow and division by 0.

yoid2000 commented 5 years ago

@cristianberneanu good point. Looks challenging but off hand don't see why it couldn't be made to work.

obrok commented 5 years ago

A problem I foresee with the NULLIF approach is that it only works for the functions where we implement it. We would basically find ourselves in the situation of having to predict every possible error that can be caused by such operations. On the cloak side we catch all errors and return a null in that case, which gives us a better coverage/effort ratio.

yoid2000 commented 5 years ago

Ok, there is a dead simple and extremely powerful version of this attack (with help from @obrok )

Here is the basic thing:

select count(*)
from accounts
where lastname = 'foo' and
      2^(10000.01 * cli_district_id) = 123.12

If there is someone in the system with a matching lastname, then an overflow takes place. If there is nobody with that lastname, then the query passes, and the result is a count of 0. The above query passes, and the following query fails:

select count(*)
from accounts
where lastname = 'Zamora' and
      2^(10000.01 * cli_district_id) = 123.12

Note that the smallest cli_district_id is 1, so this query fails even when cli_district_id=1 (which is the case for 'Zamora'). The above query tells us that there is 1 or more Zamora in the database. But if you happen to know there is exactly one (which is the case here), then you can test for any additional condition you want. If the condition is False, then the query passes, and if True, then it fails.

So for instance, this query fails:

select count(*)
from accounts
where lastname = 'Zamora' and
      gender = 'Female' and
      2^(10000.01 * cli_district_id) = 123.12

so we know that Zamora is a female. Or, this query fails:

select count(*)
from accounts
where lastname = 'Zamora' and
      birthdate = '1996-11-17' and
      2^(10000.01 * cli_district_id) = 123.12

and now we know Zamora's birthday. (and note that we can do a simple search to narrow this down, like this fails:

select count(*)
from accounts
where lastname = 'Zamora' and
      year(birthdate) = 1996 and
      2^(10000.01 * cli_district_id) = 123.12

and we know Zamora's birth year. Then we find month, then day.

yoid2000 commented 5 years ago

@cristianberneanu @obrok

How would you use NULLIF to prevent for instance the following query from overflowing? I don't see it. (Or for that matter any solution other than that of testing the dangerous expressions with min and max column values....)

select count(*)
from accounts
where lastname = 'Zamora' and
      year(birthdate) = 1996 and
      2^(10000.01 * cli_district_id) = 123.12

cristianberneanu commented 5 years ago

First of all, NULLIF doesn't work exactly how I thought it would and using CASE seems like a better fit.

Secondly, protecting against overflow will be harder than protecting against divide by 0 or taking the square root of a negative number, as it can happen in a lot more scenarios.

In this case, the expression 2^(10000.01 * cli_district_id) has to be offloaded as case when log(2, 1E+308) >= 10000.01 * cli_district_id then 2^(10000.01 * cli_district_id) else NULL end.

obrok commented 5 years ago

Unfortunately, this is not at all limited to ^ - basically any operation can overflow or underflow in some way. For example these two queries:

select count(*)
from accounts
where lastname = 'Zamora' and
      9223372036854775807 + cli_district_id = 0

select count(*)
from accounts
where lastname = 'foo' and
      9223372036854775807 + cli_district_id = 0

Also reveal the existence of a Zamora.

cristianberneanu commented 5 years ago

Some alternative tricks we can try:

Force usage of numeric types in all mathematical expressions in order to increase the range of possible values.

Create custom math functions in database scripting language that catch all exceptions. Something like:

CREATE OR REPLACE FUNCTION safe_pow(a double, b double)
  RETURNS double AS $$
BEGIN
  RETURN a ^ b;
EXCEPTION
  WHEN numeric_value_out_of_range THEN
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

obrok commented 5 years ago

Create custom math functions in database scripting language that catch all exceptions. Something like:

Yeah, I think this approach has a much better chance of covering us fully. One downside is that we'd need to be have write privileges for databases even when analyst tables are disabled.

yoid2000 commented 5 years ago

Two questions though.

What does it do to performance?
Are there DBs where we can't do this?

yoid2000 commented 5 years ago

Force usage of numeric types in all mathematical expressions in order to increase the range of possible values.

I don't see how this helps, because no matter how big we make the range, the attacker can still find a way to exceed it.

Regarding the custom math function, I gather the way it would work is that, once a custom function returns NULL, the whole expression would behave as though the original data were NULL? Something like that?

yoid2000 commented 5 years ago

@cristianberneanu @obrok here's another question. To your knowledge, are there any expressions that can reliably cause a measurable delay in the computation? If so, then catching exceptions won't be good enough.

For instance, the following query (run on postgres, not the cloak) takes about 1sec if the first two expressions are True, and 100ms otherwise. This is not a valid query in the cloak, but maybe there are valid queries that have the same conditional delay behavior???

select count(*)
from accounts
where lastname = 'Zamora' and
      birthdate = '1996-11-17' and
      cli_district_id IN (select distinct cli_district_id FROM transactions)

yoid2000 commented 5 years ago

Ah, I have one. The following query takes noticeably longer if the conditions of the first join are True:

select count(*)
from 

(select uid
 from accounts
 where lastname = 'Zamora' and
      birthdate = '1996-11-16')t1
join
(select distinct uid
 from transactions)t2
 on t1.uid = t2.uid

cristianberneanu commented 5 years ago

What does it do to performance?

I am not sure, but I expect it to be small, under 10%, way better than emulating in any case.

Are there DBs where we can't do this?

We won't be able to do this on mongo. But it seems that the mongo function don't raise exceptions, they either return null, MAX_VALUE or NaN.

@obrok is right that there is some way to do it (with 5 operators) but it isn't obvious what that is.

This expression peaks around 50, is allowed and crashes on max value: 2^(1100 - abs(age - 50))

cristianberneanu commented 5 years ago

I don't see how this helps, because no matter how big we make the range, the attacker can still find a way to exceed it.

The attacker will need bigger and bigger constants in order to overflow some of the functions.

Regarding the custom math function, I gather the way it would work is that, once a custom function returns NULL, the whole expression would behave as though the original data were NULL? Something like that?

Yes. In database logic, a NULL in the input produces a NULL output (usually, the exception being some special, NULL-aware functions).

To your knowledge, are there any expressions that can reliably cause a measurable delay in the computation? If so, then catching exceptions won't be good enough.

We don't (yet) allow functions that take a variable amount of time to run (like sleep) and neither do we allow subqueries in the WHERE clause. Not sure what happens in your second example ... maybe PostgreSQL executes the JOIN sequentially instead of concurrently?

yoid2000 commented 5 years ago

Not sure what happens in your second example ... maybe PostgreSQL executes the JOIN sequentially instead of concurrently?

That seems to be the case. The attack doesn't work if I swap the order of the joins.

Note, however, that this is not true for the overflow attack. Whether the overflowing condition is first or last, the attack still works. I presume that postgres is doing something in its optimizer/planner to make this happen, but I don't know. Though it makes sense: put the simplest condition first...

cristianberneanu commented 5 years ago

Though it makes sense: put the simplest condition first...

Probably is more related to the condition using an index or not: a complex condition might not use a column index, even if one exists.

yoid2000 commented 5 years ago

@obrok @cristianberneanu

For timing attacks, I think what we need in general is a way to force the DB to execute everything. Any thoughts on this? (Even my version of the overflow attack from above would not work if we were forcing the DB to do everything, though in the case of overflow I don't think that point matters because where are likely other ways of exploiting overflow.)

yoid2000 commented 5 years ago

Just fyi, if my version of the overflow attack is modified as follows, then it fails. For once, math working for us!

select count(*)
from
(select case when lastname = 'Zamora' then 1 else 0 end as con1,
      case when birthdate = '1996-11-16' then 1 else 0 end as con2,
      case when 2^(1000.01 * cli_district_id) = 123.12 then 1 else 0 end as con3
from accounts)t
where (con1 * con2 * con3) = 1

(though to be clear @cristianberneanu 's solution is certainly more comprehensive and better. )

yoid2000 commented 5 years ago

For the timing attack, I tried the following fix, which forces each join expression to return at least one row (returning a NULL value in this case). Doesn't help, because postgres is apparently smart enough to recognize that it can ignore a NULL output.

select count(*)
from 

(select NULL as uid
 UNION
 select uid
 from accounts
 where lastname = 'Zamora' and
      birthdate = '1996-11-16')t1
join
(select NULL as uid
 UNION
 select distinct uid
 from transactions)t2
 on t1.uid = t2.uid

This following version does work. Much more complex however because it means that the cloak needs to know of values that will fail to meet the join condition.

select count(*)
from 

(select 1829384756473 as uid
 UNION
 select uid
 from accounts
 where lastname = 'Zamora' and
      birthdate = '1996-11-16')t1
join
(select 48592203948 as uid
 UNION
 select distinct uid
 from transactions)t2
 on t1.uid = t2.uid

yoid2000 commented 5 years ago

This following version does work. Much more complex however because it means that the cloak needs to know of values that will fail to meet the join condition.

Actually, not sure it is all that complex, because one could pick a large random number and with very high probability it will be unique. And if it isn't then only one row affected...

yoid2000 commented 5 years ago

Getting back to the overflow problem, I don't see offhand why we can't implement a checker in the cloak. It would be a piece of software that searches for a peak in the math function output (a kind of binary search) and indicates whether the value hitting that peak causes an overflow. We run every math snippet through that and reject queries for which a reasonable value causes an overflow...

obrok commented 5 years ago

It would be a piece of software that searches for a peak in the math function output (a kind of binary search)

It's not a binary search. Binary search will only work for monotonic functions.

yoid2000 commented 5 years ago

It's not a binary search. Binary search will only work for monotonic functions.

That's why I said "a kind of binary search" ;)

But anyway something that finds one or a few peaks....

obrok commented 5 years ago

"a kind of binary search"

IMO that's a misleading way to talk about it. A binary search only works in very specific conditions, most functions don't satisfy them.

yoid2000 commented 5 years ago

ok fair enough.

On Fri, Mar 15, 2019 at 4:20 PM Paweł Obrok notifications@github.com wrote:

"a kind of binary search"

IMO that's a misleading way to talk about it. A binary search only works in very specific conditions, most functions don't satisfy them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Aircloak/aircloak/issues/3654#issuecomment-473327564, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qS5BFIqjSi9kp-nhitKhzqBsekL1ks5vW7o9gaJpZM4bpF-H .

sebastian commented 5 years ago

These will have to be addressed in some form and will delay the upcoming release.

cristianberneanu commented 5 years ago

That's why I said "a kind of binary search" ;) But anyway something that finds one or a few peaks....

This could be solved in a general way by computing the inflection points of the graph (the points where the first order derivative is zero) for the input range [min_value, max_value]. But that would require from us the capability to compute the derivative of an arbitrary expression and solve the resulting equation. This sounds complicated. There is also the case of expressions containing 2 or more columns. I think catching all exceptions in the database might be simpler.

These will have to be addressed in some form and will delay the upcoming release.

Which ones? The division by 0 and/or overflow issues? Or the timing attack issue? That latter will be much harder to solve.

sebastian commented 5 years ago

Note: I moved this out of the current release and into a service release immediately following it. The reasoning being that it's unclear how much work is required for these fixes to be made, and we do not want to delay the current release indefinitely.

sebastian commented 5 years ago

Regarding implementing user defined functions, @obrok writes:

Yeah, I think this approach has a much better chance of covering us fully. One downside is that we'd need to be have write privileges for databases even when analyst tables are disabled.

I think it's possible to create temporary functions? Maybe these can be created without write permissions?

sebastian commented 5 years ago

This issue here now describes multiple distinct attacks:

overflow
divide by null
timing based attacks

As such it no longer works as an implementation issue. Going forward, please let's create fresh issues when finding other orthogonal problems!

I'll create a separate issue for the overflow, div null, and timing attack and close this one. It looks like our best bet at the moment is to create used defined functions in the DB, which would likely address overflow as well as div by null, but I am creating them as separate issues all the same in case we come up with different fixes.

Aircloak / aircloak

Attack that exploits DB crash due to out-of-range overflow #3654