Open tarcieri opened 9 years ago
What about the ECC implementations makes them problematic? I understand they were ported from curve25519-donna and the corresponding ed25519 C++ implementation. Those are very well respected implementations, in major use pretty much everywhere. They have test vectors. They pass the test vectors. There are not any problematic Weirstrass-form (eg the NIST curves) implementations - which is good, because a quality implementation of them is at best highly susceptible to side channel attacks and at worst, well, you know - they're practically unicorn-status. So?
However, I like the idea here.
Since they're pure Rust, and not leveraging a tool like Nadeko for assembly generation, they're potentially vulnerable to data-dependent timings, since LLVM will still insert branches into code that appears on its surface not to contain data-dependent branches.
The donna implementations you reference contain a reference C version, but also include x86-64-specific assembly which is needed for constant time operation (or as close as you can get to constant time operation on x86)
Oh, right. Well, we can put assembly in C files which are compiled by the gcc crate, as is done with AES-NI. This crate's maintainer has already stated they don't wish to use Nadeko until it's more mature (which I happen to agree with). Maybe I'll try out such an implementation (C-asm) today for the ECC... though I'd planned on attempting AES-GCM a la http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/communications-ia-galois-counter-mode-paper.pdf today.
Side note: while the LLVM unpredictable output is surely a thing worthy of bypassing, the implementations would still be very nearly constant-time I would think - such a property is more dependant on algorithm than compiler, no? As such it's probably not a "danger" so much as it is an "improvement" (open to being wrong about this - in which case I'll try fix it asap).
Inline assembly is fine, but architecture specific, so you would wind up with "trustworthy" ASM-backed implementations for specific architectures, and pure Rust "experimental" implementations elsewhere.
I think this is an important distinction, because if you can only provide constant time behavior on certain platforms, it would be nice if that were signaled in some way to users of the library. It would be nice if people had to opt into the "experimental" implementations in some way so they know exactly what they're getting.
How about adding something like a tabular report which also is a checklist? In this case for every cipher or hash algorithm some stuff should be checked:
As long as 1., 2. and 3. are not done, nobody should ever use this code in production. As long as not all of those are done, the module should be discouraged and disabled by default.
I totally support the goal of this issue - separate out the more reviewed implementations from the less reviewed ones. Right now, probably everything in the library would fit into the less reviewed category. So, my feeling is that while this is a blocker for a potential 1.0 release, I don't know that I want to take action on this right now (although, if someone wants to start figuring out the machinery to make this happen, that would be very cool). If we separated out the insufficiently reviewed algorithms, we'd basically have to put the whole library in that category. I hope the README and the very low version number convey that Rust-Crypto is not as mature as a project like OpenSSL.
@tarcieri Its hard, but not impossible to write algorithms that are constant time in Rust. The tricky part is actually proving that. I have some ideas about how to do that: submit a large number of random inputs while tracking all the instructions thats are used to process that input. If all of the inputs result in the same set of instructions, I think its fair to say the implementation is fixed time. Implementing those checks is non-trivial, but, I have coded up a very basic, prototype implementation that I used to verify a few of the implementation on an older version of Rust. There are many challengers in making such a strategy workable, but, I think it can be done. Certainly an important conversation to have.
@genodeftest: I like your checklist and I think thats a good model for what we should have for a potential 1.0 release at some point. Right now, nothing would meet those standards and I hope the README and the low version number convey that to any potential users.
@DaGenix I doubt any ordinary compiler (including LLVM/rustc) for ordinary languages is able to produce constant-time code. You would have to make sure that your implementation will be compiled to constant-time code no matter
In practice you have to use bare assembler code to do that.
@genodeftest One of the big, giant challenges for rust-crypto is proving that it is fixed time for some subset of its algorithms. If that can't be done, its never really going to be a viable alternative to something like OpenSSL. So, I think its kinda necessary to assume that it can be done.
My current strategy is what I outlined above: feed in random inputs and monitor the instructions that are executed and check to see if they are the same each time. The next part of the plan would be to build rust-crypto as a dylib and then run those tests for a reasonable set of compiler version / architecture / OS / optimization level pairs. For each version that passed, the SHA256 of the dylib would be recorded. Then, through some as yet non existent feature in cargo, we would check that the resulting rust-crypto dylib has a hash that is equal to one of those verified SHA256 hashes whenever rust-crypto is built in release mode as a dependency of another project. If it doesn't, we would throw an error.
There are a bunch of challenges to make that type of scheme work:
So, maybe its doable, and maybe its not, but I hope its doable.
Re: branch prediction - I don't think this is a major issue. In order for the instruction stream to be independent of the input data (which is what I'm hoping its possible to prove, at least for some algorithms), all of the branches that are taken must be secret data independent. As long as thats true, I don't think it matters what the branch prediction is doing since those predictions will always be based off of non-secret data. So, if it makes one set of predictions the first time an algorithm runs and another set of predictions the second time, I don't think it matters since no secret data would have been used to make those predictions. That, of course, assumes that the branch predictor won't look at secret data while making its guesses - something I don't think it would do: if we're branching on RAX which has non-secret data in it, I don't think the predictor would look at RBX which has secret data. If a branch predictor were to do that, it would probably impact assembly implementations pretty badly too.
Ohai, circling back on this after quite some time...
Perhaps a table in the README describing the implementation state of various ciphers? I'd be happy to help contribute that.
Here's one I've done in the past: https://github.com/cryptosphere/cryptor#ruby-cryptography-library-comparison
One of the big, giant challenges for rust-crypto is proving that it is fixed time for some subset of its algorithms. If that can't be done, its never really going to be a viable alternative to something like OpenSSL. So, I think its kinda necessary to assume that it can be done.
It can be done – in pure assembler. Ordinary compilers are not built to generate constant-time code. This will probably be even harder in rust/LLVM compared to C/GCC.
My current strategy is what I outlined above: feed in random inputs and monitor the instructions that are executed and check to see if they are the same each time. […] For each version that passed, the SHA256 of the dylib would be recorded.
So you're assuming that when compiled for a specific architecture with a specific combination of compiler and rust, libstd, … the binary will be the same. I bet it won't be. Especially with multithreaded compilation ("parallel codegen") introduced in Rust 1.2 any rust build probably will be "guaranteed to be" non-deterministic. So rustc needs to get a feature to guarantee reproducible builds before that happens.
No such mechanism to test for the instruction stream really exists anywhere that I've found. I have a basic prototype, so, I believe its doable. But this alone is a big task.
Can you please elaborate that? I don't understand it. Do you mean hashing the instruction stream? Or measuring the number of instructions passed?
Re: branch prediction
Doesn't branch prediction happen at runtime (through intel CPU microcode) too? I thought I remember this as the main reason not to use branches for timing reasons. And how do you guarantee your compiler doesn't generate branches in binary from your branch-free source code?
C code (at least) can be constant-time. Libsodium uses C code almost exclusively – even the AES-GCM implementation is in C with AES-NI intrinsics.
Hello, I like @tarcieri's idea of implementing a table. I would like to suggest that this table:
The documentation strings could cover standard items like those in @genodeftest's list. For example: "unsafe_source_tested", "audited_by [auditor] on [date] regarding [audit scope]"... I am certain you could come up with better strings than these.
The intention is to add a single maintenance task (whoever edits the code updates the documentation in the code itself) instead of two (editing the code and separately updating a table), increase the chances of this table always being up to date and, finally, add documentation to the code itself.
I apologize in advance if I am adding noise to this discussion. I am new to github and to rust, (but not so much to crypto), and I am looking for ways to be helpful within my technical and time constraints.
The implementations in this library are of varying code quality. Some look robust, for example the AES-NI implementation. Others look... less robust, for example the fallback pure Rust AES implementation, or any of the ECC implementations.
Would it be worth calling out implementations that are currently "experimental" and use-at-your-own-risk from the ones that can be considered more trustworthy?
Granted this is a rather arbitrary distinction, but perhaps you can come up with a rubric for evaluating whether an implementation is considered "experimental" or not.