Open bernardpaulus opened 7 years ago
First -- thanks for submitting! This is a niche project, and I haven't really promoted it in any way. But still, it's been quiet!
And thanks for pointing out the flaw!
@bernardpaulus -- Reading through code again, and wondering if I am missing something. Do you subtract 1 from the prime twice? once before calling nonzero_random()
and once inside it? Is this on purpose?
I guess it seems from Wikipedia that this is on purpose. So it seems that my initial code was incorrect in two ways, that partially cancelled each other out. By not adding 1, I allowed a coefficient to be 0. By not subtracting 1, when I fixed the first problem I would introduce the problem of allowing a coefficient to be equal to the prime, rather than 1 less than the prime.
If my understanding is incorrect, please let me know. Otherwise, ignore. ;)
Thanks!
Hello Fletcher!
Not too fast! I actually modified the wikipedia article to gain more visibility on this and quicker feedback and I'm regularly checking it for changes.
Now you said you looked up wikipedia to check my change, I realized I was wrong and reverted the edit.
=> more details follow
In reading from Shamir's 1979 paper (copied from OCR'd image, so forgive "typos"):
The coefficients a~..... ak_~ in q(x) are randomly chosen from a uniform distribution over the integers in [0, p), and the values D~..... Dn are computed modulo p.
The key is "[0,p)", which would imply that the coefficients can be 0, but cannot be equal to the prime.
Correct?
It looks as though some of this has been discussed here:
So, yes, the idea was to stop generating random numbers in the interval [0, 255], but instead generate them on [1, 256]
To be more clear about the change, we moved from:
coef[i] = rand() % 256; // [0, 255]
to
coef[i] = rand() % 255 + 1; // [1, 255]
the same applies for arc4random_uniform.
I think this is required to avoid weak polynomials. Indeed, if you take wikipedia's example:
(a0 = 1234 ; a1 = 166; a2 = 94)
And, if, by chance a2 is 0 instead, the above is equivalent to:
splitting phase
(a0 = 1234 ; a1 = 166 ; a2 = 0 )
f(x) = 1234 + 166 x
Ds = [(1, 1400), (2, 1566), (3, 1732), (4, 1898), (5, 2064), (6, 2230)]
recovery phase
now, only two shares, say (x0 = 1, y0 =1400) and (x1 = 2, y1 = 1566), are sufficient to recover the secret:
Lagrange interpolation with two datapoints, check the wikipedia article for the formulas.
l0(x) = (x - x1) / (x0 - x1) = (x - 2) / (1 - 2) = -x + 2
l1(x) = (x - x0) / (x1 - x0) = (x - 1) / (2 - 1) = x - 1
L(x) = y0 * l0(x) + y1 * l1(x) = 1400 * ( -x + 2 ) + 1566 ( x - 1 )
= -1400x + 2800 + 1566 x - 1566 = 166x + 1234
we recovered our polynomial with only two shares, when 3 are supposed to be required. Let's recover the secret:
L(0) = 0x + 1234 = 1234
It works similarly if a1 is 0 instead of a2 in the example. Also, in the general case, each factor equal to 0 reduces by one the number of shares required to find out the secret. This is a matter of degrees of freedom of the polynomial.
Caveat: I don't have enough time to transpose this to finite field arithmetic right now.
possible attack
Given an attacker that attempts to recover the secret (= a password) with k - 1 shares instead of the k required and who as access to an oracle (= a login form) to verify if his guess is true, the attacker therefore has a non-negligible probability (= better than brute force) of recovering the secret (= being able to log in) by just attempting a recovery with his k - 1 shares.
Given:
and uniformly distributed coefficients in [0, p-1] = [0, 256] The probability of him recovering the password by attempting a recovery with k - 1 keys is thus:
P(p, k) = 1 - ((p-1) / p) ** (k-1) // NOT(the probability that no coefficient is 0)
For the above parameters, the attacker has the following probability to recover the shares with only two shares:
P(257, 3) = 0.008 // rounded up, 0.8%
Which is better than brute-forcing.
To prevent anyone to have a non-negligible chance of recovering the secret with less than the k required shares, we have to ensure the underlying polynomial has the required degrees of freedom, e.g. that no coefficient is equal to 0.
Cryptostatis' answer is in line with the above (just upvoted it)
Hum... the simplest would be trying it out with the code.
Will do that tonight :)
I am not a mathematician or cryptologist (just a physician with a college background in engineering.)
But, if this is truly a flaw in Shamir's scheme, wouldn't there be published criticism discussing this somewhere? There may be, but I haven't found it yet. My search is limited to google, and not very refined since I am not particularly experienced in focused searches of the literature in this field. I have found other descriptions of weaknesses, but these seem to refer to limitations of how this can be used, not actually weaknesses in the methodology or strength of "encryption." They all seem to involve dishonest shareholders, not outsiders with no information about the shares.
EDIT: For completeness, I'll add that it's possible I am misunderstanding something in the paper.
Another page with a purported flaw in Shamir's scheme that seems to not be true (based on comments):
https://lardbucket.org/blog/archives/2007/10/30/a-flaw-in-shamirs-secret-sharing-method/
I don't claim to understand finite fields well enough to understand all the implications of them on Shamir's scheme, but I think that is an important part of this from what I'm reading.
As a first test, I tweaked the code to always generate 0
for the first coefficient, and random numbers for the rest. It still requires the specified number of shares to reconstruct the key in my tests.
Conversely, when I set all of the coefficients to 0
, then it only requires a single share (they are all identical) to recover the key. When I set all coefficients to 1
, it again requires all the shares to recover the key.
I'm not saying this, by itself, disproves your theory. Just trying to add data.
Here's a challenge.... ;)
I encrypted a short phrase using a modified version of the code. It is set to require 4 of 4 shares to reconstruct the secret. I modified the source to force one of the coefficients to be 0 (I won't tell you which one....).
I also included a similar example (different phrase) using the regular non-zero coefficients. You have a 50/50 chance of guessing which is which.
Here are 3 of the 4 required shares - example A:
0104AA312146142D8BB6
0204AA6B95CDE66B1170
0304AAA84BBE717861D0
And, example B:
0104AA3FA2C366FEECD4
0204AA0DA2E673BF0535
0304AAAB3DFD0884E081
Of course, the object is to determine the phrase, showing your work to prove it's not just using psychology to determine what i might have picked as phrases.
In reading through your comment and trying to understand it, I have a question/comment.
I understand your comments through the splitting phase.
But it seems that you make an assumption in your recovery phase, namely you assume that the x^3 coefficient is 0
, but you only know that because you set it to 0
in advance. Therefore you use the formulas to solve for a 2nd degree polynomial, and are given two data points. Therefore, of course you found the right answer. But you only know this because you are both the dealer and attacker.
What would happen if you (as attacker), appropriately used the equations for a 3rd degree polynomial with the same two data points? Would you still get an answer? Or instead, as I suspect, would you get a family of equally likely answers?
Would the same thing not also be true if the attacker chose to fix the largest coefficient to another set value, e.g. 5
? If 5
happened to be correct, then they could solve with a smaller number of keys.
So, if an attacker accurately guesses the first coefficient, then they could solve the problem with one fewer key than nominally required. But this would seem to be the same as a brute force attack, no? The attacker would have no way of knowing whether the guess about the first coefficient was correct.
Am I understanding this correctly? If not, what am I missing?
Thanks!!
Hello,
A quick answer, because I didn't had the time and won't have more time this weekend:
Coming back to you on that next week :)
@bernardpaulus The suspense is killing me! ;)
Hi there.
It is my understanding that if your coefficients can be zero, then the security of the algorithm is not reduced. Even if you know k - 1 points on the curve, all values for s are equally likely. As a matter of fact, if coefficients are not allowed to be zero, then some values of s are not possible. Take, for example, p = 3 and k = 2. With the point (1, 1) and the knowledge that a1 is nonzero, it must be that s is 0 or 2. Otherwise, 1 = 1 + a1 which implies a1 = 0. This is exaggerated with such a small prime, but you get the picture.
However, if it is the case that some coefficients will be zero (or any other known value), then the security of the algorithm is crippled. An attacker needs to know fewer shares to guess the secret. If there's one known coefficient value (zero or otherwise), then once the attacker has k - 1 shares, they only need to solve some k - 1 linear systems of equations to get as many possible secrets.
@ferrerrajc That is my understanding as well. In this case, 0 is equally as likely as any other coefficient (except for the intentionally crippled examples I posted.)
Hi,
Sorry for not getting back at you earlier! @ferrerrajc the likeliness argument is very interesting, but could you specify the size of the finite field in your example? (P in wikipedia's notation)
Cheers :)
Hey @bernardpaulus,
In my example, I used p = 3 for my prime. So the secret can only take one of the values 0, 1, or 2. I wanted a very small prime and polynomial to demonstrate the effect of limiting coefficient values.
I pushed a change to develop that restores the full range for the random numbers (e.g. 0 to 256, with prime of 257).
Based on everything I have been able to read, I believe this is correct and the range stated in Shamir's paper.
If I have misrepresented the original paper, then I definitely want to know and fix it.
If there is widely accepted proof that Shamir was wrong, and 0 should not be allowed, I am happy to change that as well.
Otherwise, individuals can modify their own version to remove 0 at the risk of actually decreasing the security of their implementation slightly (by taking the number of possibilities from 257 to 256).
The generated coefficients can be 0. Though improbable, if all coefficients are 0, all the shares would be the secret.
In general, the fact that the random generator can produce 0 introduces the risk of lower-order polynomials and therefore a risk of a lower number of shares needed to recover the share than required.