Improve constant-time comparison `CRYPTO_memcmp`

briansmith commented 8 years ago

See https://github.com/openssl/openssl/pull/102. It should just get rewritten to use some guaranteed-constant-time OS-provided function like timingsafe_bcmp and/or assembly langauge code and/or the C++ guaranteed-timing-safe (draft/proposed?) API.

Dmitry-Me commented 8 years ago

This https://github.com/aperezdc/signify/blob/master/timingsafe_bcmp.c is not a tiny bit better than CRYPTO_memcmp() - an optimizing compiler is free to break that code at any moment.

briansmith commented 8 years ago

@Dmitry-Me I know that the "portable" timingsafe_bcmp and related utilities are no good. However, I heard that the "real" timingsafe_bcmp in OpenBSD actually works. However, I could be wrong.

The main thing I want to avoid is trying to trick the C compiler into doing something, e.g. by using volatile. In particular, I think the C compiler is allowed to realize that a volatile pointer only points to not-actually-volatile memory, such as memory on the stack, and optimize away volatile.

Dmitry-Me commented 8 years ago

I googled for a while and the smartest one I found is http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/lib/libc/string/timingsafe_memcmp.c?rev=1.2&content-type=text/plain - that isn't any better than what CRYPTO_memcmp() does. Do you have a specific example of a better implementation?

briansmith commented 8 years ago

Thanks for digging that up. That's exactly the kind of thing I'd like to avoid. Probably it should just be written in assembly language for each platform.

Dmitry-Me commented 8 years ago

No, if you write it in assembly you now have two problems.

First, all those implementations must be maintained. Maintaining assembly code is harder and with multiple implementations you need much more man-hours of highly qualified developers. Look at this C implementation - it's broken big time, yet it's copied everywhere and either noone tries to fix it or his fix is just an attempt to further fool the compiler. You would have the same problem with assembly, just many times worse.

Second, some platforms won't have an assembly implementation from the beginning. They still will need C code. And what happens? Sure, broken code surfaces again.

C is the way to have good reasonably portable code. This problem must be addressed in C code.

briansmith commented 8 years ago

First, all those implementations must be maintained. Maintaining assembly code is harder and with multiple implementations you need much more man-hours of highly qualified developers. [...]

It is really not that much work. First, you can compile the "broken" C implementation to assembly language, clean it up, and verify that it is constant time (making some assumptions about the target arch, which you also document at the time).

You would have the same problem with assembly, just many times worse.

I agree in the abstract (see http://blog.erratasec.com/2015/03/x86-is-high-level-language.html). However, I still think it is likely that assembly language implementations will be "good enough" for most targets.

Second, some platforms won't have an assembly implementation from the beginning. They still will need C code.

I agree that could happen. However, this project only supports x86, x86_64, ARMv6+, and Aarch64. Soon we'll add other platforms (e.g. MIPS32) but I don't forsee us adding any that we can't write assembly language for. (Maybe NaCl/PNaCl.) If we can't write assembly language, we can always do more expensive things like Brad Hill's double HMAC verification (https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2011/february/double-hmac-verification/). But, we shouldn't pay that cost on platforms we can avoid it.

C is the way to have good reasonably portable code. This problem must be addressed in C code.

Regardless of all of the above, if you have an idea for how to implement a portable and efficient C timing-safe buffer comparison function, I would love to see it. My understanding is that it is impossible to do because C's semantics don't offer any help for timing side channels.

Dmitry-Me commented 8 years ago

The best portable solution so far is using volatile* pointers. It doesn't guarantee constant time implementation per Standard but current compilers react to voilatile* pointers by diligently generating all the reads and writes and we should expect that this practice continues.

briansmith commented 8 years ago

@Dmitry-Me I didn't realize that you were the author of that OpenSSL patch. I understand the reasoning you used to write that patch. For this project, ring, my goal is to eventually have a formally-verifiable/-verified implementation, and so we need to have some kind of guarantee about the correctness that goes beyond a working understanding of what compilers currently do.

Dmitry-Me commented 8 years ago

@briansmith You cannot have a bulletproof implementation of this function and so you cannot have a "verifiable" one either. volatile* pointers are working because compilers writers keep them working and we should expect them to continue doing so. However the C Standard does not require that behavior. Whatever you craft in assembly or with multiple translation units may work at this moment but will be broken by a compiler which can read assembly or when build settings are suddenly changed (LTO penetrates translation unit boundaries).

briansmith commented 8 years ago

I am hopeful that we'll be able to tune the toolchain such that the toolchains that we support guarantee not to rewrite or omit calls to assembly-language functions from Rust or C. I agree we have work to do with respect to that. I don't think it's fruitful to ask C compiler writers to guarantee the volatile semantics that would be required for volatile to be a solution to this problem, and also I don't want to pay the performance cost of using volatile in ring.

Dmitry-Me commented 8 years ago

@briansmith What's the ring?

briansmith commented 8 years ago

See the readme: https://github.com/briansmith/ring. This is a Rust crypto library that originated as a fork of BoringSSL.

Dmitry-Me commented 8 years ago

@briansmith Okay, how would reading through const volatile char* pointers worsen the performance?

briansmith commented 8 years ago

OpenSSL has assembly language implementations of CRYPTO_memcmp now. We should use them.

briansmith commented 7 years ago

OpenSSL has assembly language implementations of CRYPTO_memcmp now. We should use them.

Back in July 2016 when I wrote that, that seemed like a reasonable course of action. However, between then and now I learned that that's not going to be reasonable for us to maintain. I'm working on some new approaches to this issue that I hope to share...sometime.

briansmith commented 6 years ago

Back in July 2016 when I wrote that, that seemed like a reasonable course of action. However, between then and now I learned that that's not going to be reasonable for us to maintain. I'm working on some new approaches to this issue that I hope to share...sometime.

Now I wrote up the plan in https://github.com/briansmith/ring/issues/626.

briansmith / ring

Improve constant-time comparison `CRYPTO_memcmp` #95