C2SP / wycheproof

Project Wycheproof tests crypto libraries against known attacks.
Apache License 2.0
2.78k stars 292 forks source link

"Acceptable" inputs are difficult to assert on (missing flags?) #59

Open davidben opened 6 years ago

davidben commented 6 years ago

Wycheproof defines a tri-state result value, with "acceptable" alongside "valid" and "invalid". It would be nice to programmatically dispatch on these so implementations can test that they accept or reject them as they expect.

It looks like the flags field could be used here, if that was not already the intent. I started looking at using them in BoringSSL, and it's almost what we'd need, but some tests are under-annotated. The model I'm thinking is that our GetWycheproofResult function would take a list of flags we consider acceptable. If any (all?) of a test case's flags fall in that set, we'd map "acceptable" to "valid", otherwise "invalid".

Is this close to what you all had in mind? In that case, these are the "acceptable" inputs that lack flags:

alex commented 6 years ago

Would you accept a PR to x25519_test.json which marked any of the ones described as "public key with low order" as "Small public key"? Does the PR just get sent directly to that file?

thaidn commented 6 years ago

@bleichen

bleichen commented 6 years ago

I'm working on adding more flags. Since flags are relatively new, not all the generation code has them.

Not sure yet, whether that status of some test vectors should be changed. I.e., many of the test vectors with weak parameters (e.g. small IV sizes) currently have status "acceptable" because some libraries already reject weak parameters. It might be possible to ignore the fact that some parameters are weak and simply mark such test vectors are "valid" then let the tester decide if a failure is due to weak parameters based on the information in the header of the test group.

On Wed, Jul 11, 2018 at 1:36 AM, Thai Duong notifications@github.com wrote:

@bleichen https://github.com/bleichen

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/wycheproof/issues/59#issuecomment-403998602, or mute the thread https://github.com/notifications/unsubscribe-auth/AYx_3lQN7Zjf5aR4H81wmxmTAivsiT6Pks5uFTptgaJpZM4VACSq .

davidben commented 6 years ago

For BoringSSL, I think we're interested in:

Having weak parameters, at least for things like RSA keys, filtered out by either looking at the value or just splitting into files makes sense. I agree that weak things count more as "acceptable" than "valid", but I expect different libraries to have different cutoffs and whatnot for this, based on their needs, and so it may not be useful as a programmatically-checked status code on the test. Though it makes sense as something that could programmatically go either way; the main reason I'm interested in programmatically forcing the other "acceptable" inputs one way or another is the other cases, like bad ASN.1, are quite interesting to assert on.

bleichen commented 6 years ago

On Wed, Jul 11, 2018 at 7:26 PM, David Benjamin notifications@github.com wrote:

For BoringSSL, I think we're interested in:

  • Excluding curves we don't support; the split up files are great, thanks!

That was the idea. Also the mixed files are becoming too large, so they will eventually be reduced to test cases not covered in other files.

  • X25519 should compute the right answer in all cases, but there is a return value that corresponds to the all zero output. We're fine with the other "acceptable" inputs. (Per the formulation in RFC7748.)

Not sure I understand this.

  • We don't allow explicit curve encodings in public keys. These are forbidden by RFC 5480.

Others do however. So it is necessary to test with such encodings. E.g. one case was BouncyCastle, where it was possible to encode a public EC key with an incorrect order. One of the schemes (ECDHC) would check that the curves for public and private key are the same by checking that a,b,p,... match but then reduce the private key modulo the order passed in with the public key. Since the main goal is to find bugs that are exploitable, such test vectors typically contain the shared ECDH result computed over the curve from the private key. Getting a different value is the most concerning test result.

  • Our ASN.1 parsers are generally strict.

Any test case that is not DER encoded is at most "acceptable". Making all BER encoded inputs invalid seems a bit premature, since there are still a lot of libraries that are less strict. Annotations are still weak, so that will be improved.

  • We supported compressed coordinates.

Compressed and uncompressed coordiantes should probably be mixed in the same files. The question whether an implementation is vulnerable to attacks with public key points not on the curve, depends what formats the receiver accepts and not what the sender generates.

  • We are strict in the DigestInfo encoding in RSASSA-PKCS1-v1_5.

Well, everyone should.

  • We accept all specified IV lengths of AES-GCM

I hope not IV length 0.

  • and leave RSA key size limits for the caller to enforce. (The joys of being a low-level library with existing users... 😢)

Which seems completely reasonable to me. I don't think it is the task of the library maintainer to enforce new security limits. Most libraries will have to support existing users with existing keys. If restrictions are made then the key generation is a good place to start. E.g., it isn't a problem if 1024 bit RSA keys are supported by a library. But it is a problem if 1024 bits are the default (or even any default).

Having weak parameters, at least for things like RSA keys, filtered out by either looking at the value or just splitting into files makes sense. I agree that weak things count more as "acceptable" than "valid", but I expect different libraries to have different cutoffs and whatnot for this, based on their needs, and so it may not be useful as a programmatically-checked status code on the test. Though it makes sense as something that could programmatically go either way; the main reason I'm interested in programmatically forcing the other "acceptable" inputs one way or another is the other cases, like bad ASN.1, are quite interesting to assert on.

Finding good cut-off is indeed a problem. Since the parameter sizes are also in the headers of the test groups it might be possible to drop this distinction and just require that the tester decides whether a failing test vector is due to weak parameters not supported by the library.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/wycheproof/issues/59#issuecomment-404248803, or mute the thread https://github.com/notifications/unsubscribe-auth/AYx_3oY6C7o-CyOvLDzMQaK6niKZ3vkrks5uFjVYgaJpZM4VACSq .

davidben commented 6 years ago

Others do however. So it is necessary to test with such encodings.

Oh, certainly! Sorry, that was probably unclear. I was just listing the things I would like to be able to differentiate via flags or checking parameters or some other mechanism.

NeilMadden commented 2 years ago

Chiming in late to this issue, but if I can selfishly add my own wish it would be to have a way to distinguish between test cases (for signatures) that are needed to achieve strong unforgeability vs those needed for existential unforgeability.

I am seeing some test failures for lax parsing of ASN.1 DER for RSA signatures in Java (in particular allowing a constructed/concatenated OctetString in the DigestInfo structure). While it would be good for the cryptography provider to fix this, from our point of view we don't need canonical/unique signatures so I don't think this presents an exploitable issue in our scenario. For example, we also support ECDSA signatures that are malleable anyway. (Please correct me if I am misunderstanding this issue!)