The-DevX-Initiative / RCIG_Coordination_Repo

A Coordination repo for all things Rust Cryptography oriented
https://cryptography.rs
256 stars 33 forks source link

Compiler-level changes #55

Open beldmit opened 3 years ago

beldmit commented 3 years ago

During the last meeting we discussed about possible compiler enchantments (e.g. explicit switching optimization off). Otherwise we have no chance to defend from side-channel attacks, that means we can't safely use the pure Rust crypto implementations.

I wonder if any from the Rust compiler team is involved in this group so we have chances to meet these requirements.

tarcieri commented 3 years ago

I've talked at length with various people who work on both rustc and LLVM about this. In fact there was a (now postponed) RFC for it:

https://github.com/rust-lang/rfcs/pull/2859

See also this RustFest Global talk about it.

My understanding is there has been some work on the LLVM side by Google to add support for this to at least a RISC-V backend, although AFAIK that work is still not public.

It seems like there isn't interest in adding support for this to rustc until the corresponding changes have landed on the LLVM side.

richsalz commented 3 years ago

Do we want to use this issue for all compiler issues? I'd like auto-zeroization when dropping tagged variables.

nikomatsakis commented 3 years ago

I'm wondering how much this is a compiler- vs- language issue. It seems like as much the latter. I'd be interested in hashing out a bit more concretely what's desired, if there's low-hanging fruit it may be worth acting on.

tarcieri commented 3 years ago

Zeroization is another issue that has come up before.

While there are library-level solutions to zeroizing data on drop (zeroize, clear_on_drop), there are much trickier problems than that which actually need the compiler's help, like ensuring values that are copied on move are also zeroed. Moves that wind up copying data are something which isn't immediately visible from the library level, and thus can't be easily solved that way. Some alternatives there are using something like Pin to prevent data from being moved, but there are cases where moves are required.

Additionally zeroize-on-drop is suboptimal for things like transient secrets. During the execution of a cryptographic algorithm there can be many transient secrets, and zeroizing them on drop may be a completely pointless performance hit if these secrets live on the stack and are immediately overwritten by new stack frames.

An alternative in this case is waiting until a particular section of code has completely finished executing, and then zeroizing memory which was used for the stack up to some high watermark, an approach I've sometimes heard referred to as "stack bleaching". I've opened a rust-internals issue about this before:

https://internals.rust-lang.org/t/annotations-for-zeroing-the-stack-of-sensitive-functions-which-deal-in-transient-secrets/11588

tarcieri commented 3 years ago

I'm wondering how much this is a compiler- vs- language issue. It seems like as much the latter.

Yeah, particularly in regard to something like a "secret type", it's one of those things that touches every level, from the language itself (adding something like Secret<T>) to rustc to codegen backends like LLVM (and in that regard, LLVM's IR and all of the many passes which operate on it).

romen commented 3 years ago

I've talked at length with various people who work on both rustc and LLVM about this. In fact there was a (now postponed) RFC for it:

rust-lang/rfcs#2859

See also this RustFest Global talk about it.

My understanding is there has been some work on the LLVM side by Google to add support for this to at least a RISC-V backend, although AFAIK that work is still not public.

It seems like there isn't interest in adding support for this to rustc until the corresponding changes have landed on the LLVM side.

During the last call, on top of zeroization or language support for secret types to convey intent and let the compiler frontend help the developers, we discussed also related items, which seems are not covered in the current issue.

One important part of the conversation that is not captured above, and which @beldmit referred to, was about disabling compiler optimization for a function or a code block, in order to prevent the compiler from transforming code that the developer already wrote in a constant-time fashion into non constant-time machine code.

This is orthogonal to having language features such as Secret<T> to let the compiler front-end assist the developer in writing constant-time code, or zeroizing such variables automatically when they get out of scope: even if we had these nice features already, and the language ensured a function in its human readable form is constant time, after being processed by the front-end, the corresponding IR still goes through the rest of the compiler.

I'll take hand-crafted, specialized C code as an example: often, to achieve constant-timeness in C, we use bitwise logic to transform comparisons in bit masks to be applied to the results of operations that are run even if their results will be discarded, so that instruction flow does not depend on secret inputs. Unfortunately compiler optimization layers have been catching up, and in some cases bitwise operations are turned back to comparisons and jumps by the compiler. The next step in this arms race between developers and compilers, is to sprinkle volatile and __asm__ all around, but this (on top of making things harder to maintain, easier to get wrong, and ugly w.r.t. portability) seems not to be too reliable anymore as a workaround, as compilers once again are catching up, and might optimize away even these tricks. Concretely, here are a few relevant examples in C of the things that developers caring for constant-time implementations need to do include

It would be nice, instead of racing against each other, if the communities of cryptographic developers, compiler developers and language designer, could reach an understanding of each other needs and establish a foundation of basic tools to get the different pieces to cooperate instead of fighting. During the call, one such compromise that seemed reasonable for the 3 communities was some way to annotate sections in which all compiler optimizations are off, and what the dev writes in terms of bitwise operations on primitive types (and possibly __asm__ blocks) is translated directly in the corresponding machine code 1:1.

tarcieri commented 3 years ago

I'm not aware of any work in that particular area however...

Another alternative that I think was mentioned in the call, and I am sure was mentioned in previous discussions around this topic, was to have (and expose through language-specific mechanisms) compiler intrinsics for the basic building blocks (e.g., from the previous examples list, having cmovznz and value_barrier—for the various primitive register types—exposed directly to the programmer without having to build them fighting against the compiler optimizer). Are you aware of developments in this direction?

Intrinsics for cmov-like instructions sound really interesting. I know this is a particularly tricky area with LLVM due to things like the x86-cmov-conversion pass. I'm not sure how difficult it would be to introduce a sort of cmov-with-optimization-barrier intrinsic, or how effectively an intrinsic like that could be implemented on non-x86 architectures, but it seems worth exploring.

romen commented 3 years ago

I managed to find some of my previous notes on this topic, from when I started looking into this, before being distracted by other issues.

I find the last quite disheartening, as it means yet another instance of "insecure-by-default", and also a case-by-case fight of arguments about CT vs performance, on compiler code that cannot distinguish between IR-blocks requiring CT and IR-blocks that do not, and wish for the most efficient transformation possible.

@tarcieri

Intrinsics for cmov-like instructions sound really interesting. I know this is a particularly tricky area with LLVM due to things like the x86-cmov-conversion pass. I'm not sure how difficult it would be to introduce a sort of cmov-with-optimization-barrier intrinsic, or how effectively an intrinsic like that could be implemented on non-x86 architectures, but it seems worth exploring.

Mentioning the x86 cmov in particular was useful in finding once more this blog by D. Sprenkels which was the one that started me down the Rust side of this matter!

tarcieri commented 3 years ago

this blog by D. Sprenkels

Yeah, that's a great blog post, and where I learned about that x86-cmov-conversion pass.

Unfortunately it seems to come to the conclusion that the best way around x86-cmov-conversion is to make the LLVM optimizer secret-type aware.

If there are alternatives, they might be worth exploring. But this is very much deep in the realm of the internals of the LLVM optimizer.

romen commented 3 years ago

I thought of updating the thread on this issue after the sync-up call today, they could both serve as items to jumpstart the survey of patterns/desirables/properties or as resources for anyone, with different backgrounds, to delve deeper into the problems, the guidelines and the recommendations when dealing with side-channels/constant-time issues/code.

I apologize as they are not really Rust centric, but I still think they can be a good starting point, rich with examples from different point of views, without being forced to parse through many academic papers. They are not presented in any particular order.

tarcieri commented 3 years ago

During today's call @nikomatsakis suggested it might be possible to disable the LLVM optimizer at the level of individual functions.

If that were possible, it could potentially be used to implement a cmov-like function/intrinsic which bypasses the x86-cmov-conversion pass.

Separate but related: some past discussion on llvm-dev about avoiding optimizations for the purpose of sidechannel resistance:

https://groups.google.com/g/llvm-dev/c/qwogIiwHCAc/m/HGsKD8QFBAAJ

tarcieri commented 3 years ago

@chandlerc pointed me at this C++ example of a dataflow optimization barrier using an LLVM instruction which to my knowledge does not currently have a Rust equivalent.

It seems like the sort of (comparatively) simple solution being suggested earlier. Perhaps it wouldn't be too difficult to add as something like a nightly-only intrinsic.

It seems like an improvement over the current approach used by the subtle crate.

spitters commented 2 years ago

@tarcieri since you mentioned RISC-V. There's some discussion on adding constant time to the Spec. https://github.com/riscv/riscv-v-spec/issues/417

tarcieri commented 2 years ago

I believe this is a working translation of the C++ code (please do not just copy and paste it into anything willy nilly, this is just for discussion): https://godbolt.org/z/ndqxe6qv4

However it's using the now deprecated nightly-only llvm_asm! macro.

chandlerc commented 2 years ago

Also worth noting what aspect of this is potentially unsafe -- it relies on only being used with types that fit into a physical register. For a value that doesn't fit into a register, a different sequence would be needed (you'd need to pass the address of that value into the inline asm, and use a different constraint). I'm not aware of a truly generic way to express this directly in IR, but it should be possible to lower generic MIR-style stuff into some equivalent LLVM-IR that has the desired effect of blocking dataflow optimizations.

tarcieri commented 2 years ago

@chandlerc yeah, I think the final deliverable is probably an unsafe function which could be used to reliably implement something like subtle::Choice::new (which takes a u8 argument)

Although perhaps it could have a safe API that looks like pub fn value_fence<T: Copy>(value: &T) -> T?

tarcieri commented 2 years ago

Here's an attempt at a reference-based value_fence function:

https://godbolt.org/z/371Y7Gx5x

The generated assembly isn't the nice mov/ret anymore, but hopefully it's actually safe?

Edit: here's a version without the volatile vlag, per @chandlerc's feedback below:

https://godbolt.org/z/nvMPYa7q7

chandlerc commented 2 years ago

Not sure you want the volatile flag, you can just clobber memory while passing the pointer through. For example in C++: https://compiler-explorer.com/z/bh9WzvTPq

tarcieri commented 2 years ago

Now that inline assembly is in beta, due for stabilization in Rust 1.59, I started playing around with a cmovnz wrapper and ran into something a little strange:

https://godbolt.org/z/vs83n31sa

The following function:

#[inline]
pub fn cmovnz(condition: u64, src: u64, dst: &mut u64) {
    let mut tmp = *dst;

    unsafe {
        asm!(
            "test {0}, {0}",
            "cmovnz {1}, {2}",
            in(reg) condition,
            inout(reg) tmp,
            in(reg) src
        );
    }

    *dst = tmp;
}

...was lowered to cmovne instead:

test    rdx, rdx
cmovne  rax, rcx

I was really hoping that inline assembly would be treated as a black box free of LLVM optimizations, but it seems that was not the case here.

@Amanieu @chandlerc any idea where this could be happen within LLVM, and should I still be worried about CMOV instructions emitted from inline assembly being rewritten as branches?

Amanieu commented 2 years ago

cmovnz and cmovne are just different names for the same instruction. They assembly to the same bytes. LLVM is just arbitrarily picking one valid disassembly.

tarcieri commented 2 years ago

As an experiment, I made a crypto-intrinsics crate which uses inline assembly to emit CMOV, ADX, and MULX instructions:

https://github.com/RustCrypto/utils/pull/730

Still a bit worried about things like flags LLVM isn't aware of being clobbered, but it seems like it should be possible to start using these instructions from stable Rust programs soon.

Amanieu commented 2 years ago

You should not rely on flags being preserved outside of an asm! block. If you need to use flags then either consume the flags within the same asm! block or pass the flag as an integer using the setc instruction.

tarcieri commented 2 years ago

Seems we'll need a different strategy for constructing assembly for ADX/MULX chains than mere function calls, then, in order to ensure the flags aren't clobbered.

Perhaps macro_rules! that construct the asm! could work /cc @dignifiedquire

dignifiedquire commented 2 years ago

yeah, I was afraid the flags might not work out. I suspect using some simple macros to assemble a single inline assembly call is the way to go here

tarcieri commented 2 years ago

I've released an initial v0.1 version of a cmov crate:

On x86/x86_64 targets, it uses inline assembly to emit cmovz/cmovnz instructions which are guaranteed to execute in constant time.

On other targets, it provides a "best effort" fallback to a bitwise implementation which is not guaranteed to execute in constant time.

Hopefully the latter can be addressed by adding more architecture-specific inline assembly, particularly for popular architectures like AArch64.

tarcieri commented 2 years ago

Thanks to @codahale, cmov now has aarch64 support.

I've also posted a release announcement here: https://users.rust-lang.org/t/ann-cmov-v0-1-constant-time-conditional-move-intrinsics-for-x86-x86-64-aarch64/72473