Open briansmith opened 8 years ago
This https://github.com/aperezdc/signify/blob/master/timingsafe_bcmp.c is not a tiny bit better than CRYPTO_memcmp()
- an optimizing compiler is free to break that code at any moment.
@Dmitry-Me I know that the "portable" timingsafe_bcmp
and related utilities are no good. However, I heard that the "real" timingsafe_bcmp
in OpenBSD actually works. However, I could be wrong.
The main thing I want to avoid is trying to trick the C compiler into doing something, e.g. by using volatile
. In particular, I think the C compiler is allowed to realize that a volatile
pointer only points to not-actually-volatile memory, such as memory on the stack, and optimize away volatile
.
I googled for a while and the smartest one I found is http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/lib/libc/string/timingsafe_memcmp.c?rev=1.2&content-type=text/plain - that isn't any better than what CRYPTO_memcmp()
does. Do you have a specific example of a better implementation?
Thanks for digging that up. That's exactly the kind of thing I'd like to avoid. Probably it should just be written in assembly language for each platform.
No, if you write it in assembly you now have two problems.
First, all those implementations must be maintained. Maintaining assembly code is harder and with multiple implementations you need much more man-hours of highly qualified developers. Look at this C implementation - it's broken big time, yet it's copied everywhere and either noone tries to fix it or his fix is just an attempt to further fool the compiler. You would have the same problem with assembly, just many times worse.
Second, some platforms won't have an assembly implementation from the beginning. They still will need C code. And what happens? Sure, broken code surfaces again.
C is the way to have good reasonably portable code. This problem must be addressed in C code.
First, all those implementations must be maintained. Maintaining assembly code is harder and with multiple implementations you need much more man-hours of highly qualified developers. [...]
It is really not that much work. First, you can compile the "broken" C implementation to assembly language, clean it up, and verify that it is constant time (making some assumptions about the target arch, which you also document at the time).
You would have the same problem with assembly, just many times worse.
I agree in the abstract (see http://blog.erratasec.com/2015/03/x86-is-high-level-language.html). However, I still think it is likely that assembly language implementations will be "good enough" for most targets.
Second, some platforms won't have an assembly implementation from the beginning. They still will need C code.
I agree that could happen. However, this project only supports x86, x86_64, ARMv6+, and Aarch64. Soon we'll add other platforms (e.g. MIPS32) but I don't forsee us adding any that we can't write assembly language for. (Maybe NaCl/PNaCl.) If we can't write assembly language, we can always do more expensive things like Brad Hill's double HMAC verification (https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2011/february/double-hmac-verification/). But, we shouldn't pay that cost on platforms we can avoid it.
C is the way to have good reasonably portable code. This problem must be addressed in C code.
Regardless of all of the above, if you have an idea for how to implement a portable and efficient C timing-safe buffer comparison function, I would love to see it. My understanding is that it is impossible to do because C's semantics don't offer any help for timing side channels.
The best portable solution so far is using volatile*
pointers. It doesn't guarantee constant time implementation per Standard but current compilers react to voilatile*
pointers by diligently generating all the reads and writes and we should expect that this practice continues.
@Dmitry-Me I didn't realize that you were the author of that OpenSSL patch. I understand the reasoning you used to write that patch. For this project, ring, my goal is to eventually have a formally-verifiable/-verified implementation, and so we need to have some kind of guarantee about the correctness that goes beyond a working understanding of what compilers currently do.
@briansmith You cannot have a bulletproof implementation of this function and so you cannot have a "verifiable" one either. volatile*
pointers are working because compilers writers keep them working and we should expect them to continue doing so. However the C Standard does not require that behavior. Whatever you craft in assembly or with multiple translation units may work at this moment but will be broken by a compiler which can read assembly or when build settings are suddenly changed (LTO penetrates translation unit boundaries).
I am hopeful that we'll be able to tune the toolchain such that the toolchains that we support guarantee not to rewrite or omit calls to assembly-language functions from Rust or C. I agree we have work to do with respect to that. I don't think it's fruitful to ask C compiler writers to guarantee the volatile
semantics that would be required for volatile
to be a solution to this problem, and also I don't want to pay the performance cost of using volatile
in ring.
@briansmith What's the ring?
See the readme: https://github.com/briansmith/ring. This is a Rust crypto library that originated as a fork of BoringSSL.
@briansmith Okay, how would reading through const volatile char*
pointers worsen the performance?
OpenSSL has assembly language implementations of CRYPTO_memcmp
now. We should use them.
OpenSSL has assembly language implementations of CRYPTO_memcmp now. We should use them.
Back in July 2016 when I wrote that, that seemed like a reasonable course of action. However, between then and now I learned that that's not going to be reasonable for us to maintain. I'm working on some new approaches to this issue that I hope to share...sometime.
Back in July 2016 when I wrote that, that seemed like a reasonable course of action. However, between then and now I learned that that's not going to be reasonable for us to maintain. I'm working on some new approaches to this issue that I hope to share...sometime.
Now I wrote up the plan in https://github.com/briansmith/ring/issues/626.
See https://github.com/openssl/openssl/pull/102. It should just get rewritten to use some guaranteed-constant-time OS-provided function like
timingsafe_bcmp
and/or assembly langauge code and/or the C++ guaranteed-timing-safe (draft/proposed?) API.