franck44 commented 3 years ago

Avoid creating "computation" overflows when not necessary

Overview of the Issue

In some part of the code of the Eth2.0 specs, some computations are performed on bounded integers (e.g. uint64) and may result in underflows/overflows. For instance if a is uint64, computing 3 * a may result in an overflow as there result may not fit in the 64 bits.

Overflows can contribute to reject blocks:

State transitions that cause a uint64 overflow or underflow are also considered invalid. [beacon chain transition function].

As a result a block can be rejected even if it is actually well-formed and not responsible for the overflows. (A related question is whether the previous statement applies only to uint64 or any underflow/overflow.)

Avoiding spurious under/overflows

However, it is possible to rewrite some parts of the specs in a way that avoids creating spurious overflows. Spurious overflows can appear because a computation of a property of a given value is implemented using unsafe operations, but the property could be checked without these unsafe operations.

Example: assume a and b are uint64. The following code may result in an overflow when a >= MAX_UNIT64 - 1:

if (b >= 2) 
  if (a + 2 <= b)  // possible overflow for a + 2 that may not be a `uint64`
  ....

This code snippet can be rewritten in a safer equivalent version:

if (b >= 2) 
  if (a <= b - 2)  // b - 2 >= 0 and does not underflow not overflow.
  ....

Occurrences of this issue in part of the Eth2.0 specs

The code for process_justification_and_finalization may be made safer avoiding spurious overflows.

The following piece of code may be rewritten in a safer non-overflow-prone version:

# Process finalizations
 if get_current_epoch(state) <= GENESIS_EPOCH + 1:
        return

// after this location, get_current_epoch(state) > GENESIS_EPOCH + 1 >= 2 and we can use this fact to
// evaluate the following conditions without any overflows. 
bits = state.justification_bits
# The 2nd/3rd/4th most recent epochs are justified, the 2nd using the 4th as source
if all(bits[1:4]) and old_previous_justified_checkpoint.epoch + 3 == current_epoch:
      state.finalized_checkpoint = old_previous_justified_checkpoint
# The 2nd/3rd most recent epochs are justified, the 2nd using the 3rd as source
if all(bits[1:3]) and old_previous_justified_checkpoint.epoch + 2 == current_epoch:
        state.finalized_checkpoint = old_previous_justified_checkpoint
# The 1st/2nd/3rd most recent epochs are justified, the 1st using the 3rd as source
if all(bits[0:3]) and old_current_justified_checkpoint.epoch + 2 == current_epoch:
        state.finalized_checkpoint = old_current_justified_checkpoint
# The 1st/2nd most recent epochs are justified, the 1st using the 2nd as source
if all(bits[0:2]) and old_current_justified_checkpoint.epoch + 1 == current_epoch:
        state.finalized_checkpoint = old_current_justified_checkpoint

The last three ifs statement can be rewritten as old_previous_justified_checkpoint.epoch == current_epoch - k, k = 1,2 (using a subtraction rather an addition to avoid any overflow.) The first one old_previous_justified_checkpoint.epoch + 3 == current_epoch falls into two case:

current_epoch == 2 and that case, this condition cannot be true as old_previous_justified_checkpoint.epoch is a uint64 and thus non-negative
current_epoch > 2 and in this case we can replace the test using a minus operator. Overall this can be rewritten as:

As a result it is equivalent to:

 if all(bits[1:4]) and current_epoch >= 3 && old_previous_justified_checkpoint.epoch == current_epoch - 3:
        state.finalized_checkpoint = old_previous_justified_checkpoint

Another occurrence of possible overflows is in the computation of supermajority in the same function:

if get_attesting_balance(state, matching_target_attestations) * 3 >= get_total_active_balance(state) * 2:
...

The values get_attesting_balance(state, matching_target_attestations) and get_total_active_balance(state) are uint64 and multiplying them may generate an overflow exception. This issue can be mitigated by converting the value of uint128 and perform the comparison with this type.

Formal verification that previous proposed patches are correct

The formal proof that the previous changes mitigate the overflow issues described above can be found in the Dafny verified version of process_justification_and_finalization.

How the problem was discovered

This issue was discovered as part of the formal verification of the Eth2.0 specification in Dafny project.

vbuterin commented 3 years ago

One challenge is that the spec is not solely optimizing for mathematical cleanness; it's optimizing for human readability too (and perhaps even more importantly!). "if (a <= b - 2) // b - 2 >= 0" is the exact opposite of that goal!

One idea that could achieve formal and human friendliness at the same time is having explicit arithmetically safe operations implemented as functions, eg. in a separate context I advocated for a_times_b_div_c(a, b, c) as that's a common case that currently requires ugly hacks to avoid overflows.

That said, in many of the cases above, we don't actually need to worry because many of those variables have bounds far below 2**64 that we can rely on, eg. we can safely assume slot and epoch numbers will be under 2**40, we can safely assume that any balance total will be under ~2**58, etc. Perhaps we should formalize these tighter bounds in cases where we know them?

franck44 commented 3 years ago

I am unsure about readability issues. That for mathematical integers, the truth value of a + 2 <= b is the same as a <= b - 2 seems reasonable and equally readable. However, in a programming context with fixed-sized bit vectors to represent integers, like uint64, this may be different. What is known is a fact about b, b >= 2, and using that fact we can guarantee a safe subtraction operation b - 2 in the domain uint64 (a + 2 cannot be guaranteed to be safe unless we have a pre-condition a <= MAX_UINT64 - 2).

If the specs were using mathematical integers that would be OK to use a + 2. However, because they prescribe a type, e.g. uint64 we may think that this is important to account for bounds and make sure the code is safe (wrt under/overflows).

One idea that could achieve formal and human friendliness at the same time is having explicit arithmetically safe operations implemented as functions, eg. in a separate context I advocated for a_times_b_div_c(a, b, c) as that's a common case that currently requires ugly hacks to avoid overflows.

I am unsure it solves the problem. And readability-wise a_times_b_div_c(a, b, c) does not really tell whether it is (a * b) / c or a * ( b / c ). This strategy would also require an adhoc solution for each arithmetic formula. Problems related to overflow/division by zero are hard to circumvent. A common solution is to add preconditions to functions which clearly describe the set of input values they accept to operate properly. And to guarantee the absence of problems, we then have to prove the every call to a function guarantees its preconditions. Proving the absence of under/overflows usually requires semantic-level analysis (e.g. abstract interpretation or other static analysis) rather than new/alternative syntax.

That said, in many of the cases above, we don't actually need to worry because many of those variables have bounds far below 264 that we can rely on, eg. we can safely assume slot and epoch numbers will be under 240, we can safely assume that any balance total will be under ~2**58, etc. Perhaps we should formalize these tighter bounds in cases where we know them?

Yes I believe it but that may be true under optimistic circumstances, where no one is malicious. The problem may not solely be to formalise the bounds in the theoretical foundations, but to prove that in the implementations the program variables also satisfy their type bounds. It may be doable to prove that in a given context all the operations are safe. But that is usually hard as it is an inter-procedural verification objective which requires to take into considerations the interactions between different functions (and the flow of values between them). Another point is that we may not be able to prevent attackers to write code that calls some functions with unforeseen input (for instance if one can tamper with the memory and alter some stored values and choose to set them the MAX_UINT64 - 1 although in theory this value is not supposed to be used). Finally the current specs rely on correct handling of exceptions (which may not be as simple as it sounds) so we may try to avoid/minimise their possible occurrences.

The changes proposed above aim to rule out any possibility of overflow under any context (the facts that the program typechecks and b -2 is evaluated only when b >= 2 are enough to prove it). They are simple to implement, can be formally proved as local properties of process_justif_and_final and the result is much safer and simpler code (arguably more readable as no under/overflow can occur in this code snippet, so there is no need to handle any 'overflow' exception).

dankrad commented 3 years ago

It's an interesting idea to attach range assumptions to each variable that have to be checked, and then clients can hopefully drop most of these checks by proving from other assumptions that they can never be violated. This might make it much easier to make safe statements?

franck44 commented 3 years ago

It's an interesting idea to attach range assumptions to each variable that have to be checked, and then clients can hopefully drop most of these checks by proving from other assumptions that they can never be violated.

It is not a new idea, this is common in formal verification. Note that we do not add range assumptions to input variables, but rather use their types to derive and attach (provably correct) range constraints to other variables. And indeed, once you have proved that range constraints cannot be violated you don't need runtime checks.

This might make it much easier to make safe statements?

Yes, for the example of the safe process_justif_and_final we are able to prove (it is actually a mechanised proof) a strong guarantee: no overflow can happen during the computation of the result (we actually prove more safety properties, including that all the sizes of the justification_bits are the same).

ethereum / consensus-specs