Improve std::is_x86_feature_detected in SGX

jethrogb commented 5 years ago

Currently, features not enabled at compile-time are considered disabled because CPUID doesn't work. See https://github.com/rust-lang-nursery/stdsimd/pull/598.

gnzlbg commented 5 years ago

If you compile the SGX target with -C target-feature=+avx2 and do is_x86_feature_detected!("avx2") that should return true today even though CPUID is not available.

jethrogb commented 5 years ago

And it does exactly that today.

gnzlbg commented 5 years ago

So not all features are considered disabled ;)

I don't know whether SGX can do better at run-time, do you have links to relevant documentation that I could read? If that's the case, we can improve std_detect support for SGX, either by modifying x86, or by implementing something completely different (it should be easy to do either).

Either way, it would be good to have an SGX build job in stdsimd that can run tests, so that we can easily hack on this without breaking anything.

jethrogb commented 5 years ago

I don't know whether SGX can do better at run-time, do you have links to relevant documentation that I could read?

There's not too much out there. Here are some approaches I'm aware of:

You can call XGETBV to find out XCR0, so some of the features in there may be detected this way.
There's “ask userspace and trust the result”. If userspace lies about a feature being available, you'll get #UD when trying to execute the relevant instruction, but userspace can always do a DoS, so no big deal...? If userspace lies about a feature not being available, this will result in slower performance. If you need the feature for security (e.g. RDRAND, AES-NI), the enclave should abort.
There's “try executing instructions and see if they fail”. I'm not sure if anyone is seriously considering this. Userspace can't lie about anything this way. Userspace may DoS the enclave when running the detection routine, but that seems better than DoSing later when the enclave already thinks it can do something.

gnzlbg commented 5 years ago

I think that for std applications we can link against that intel library (or re-implement it in Rust) for the SGX target. That sounds good enough to me (if userspace lies, then that's the way it is).

The std_detect crate can offer cargo features to select the other types of behavior, such that recompiling libstd with those enabled would change the run-time feature detection method.

briansmith commented 5 years ago

3. There's “try executing instructions and see if they fail”. I'm not sure if anyone is seriously considering this. Userspace can't lie about anything this way. Userspace may DoS the enclave when running the detection routine, but that seems better than DoSing later when the enclave already thinks it can do something.

Even if we accept DoS is acceptable, userspace may be reasonably lie about any bits that would affect the enclave's decisions regarding side channels. For example, userspace might lie about which microarchitecture is in use or any feature bits that indicate that a previously-safe feature isn't safe (e.g. from side channels like Spectre).

gnzlbg commented 5 years ago

FWIW one does not need SGX to run into these issues. In std::detect we already deal with CPUs that advertise that they support some feature, but then provide a completely buggy and broken implementation of it.

That’s arguably even worse because trying to execute an instruction won’t fail, but do something else than what it should do instead (e.g. making all optimizations unsound).

So at the end of the day you have to trust the hardware somewhat, user space somewhat, etc.

jethrogb commented 5 years ago

userspace may be reasonably lie about any bits that would affect the enclave's decisions regarding side channels.

Userspace feature detection should only be used for performance improvements. For security, you must only rely on what you know through SGX instructions/remote attestation. If the only way you can learn about side-channel protections is from userspace then you must assume the worst.

So at the end of the day you have to trust the hardware somewhat, user space somewhat, etc.

No. With SGX, you don't have to trust userspace at all for confidentiality and integrity concerns.

gnzlbg commented 5 years ago

No. With SGX, you don't have to trust userspace at all for confidentiality and integrity concerns.

@jethrogb: the behavior of executing an instruction that's not supported by the CPU is undefined (and we do exploit this for optimizations); this is why most hardware intrinsics are unsafe fns. If user-space (or the hardware) lies about which features are supported, then code that would have been safe to execute would have undefined behavior. I don't understand how SGX could protect you from this. Best case the code would be mis-optimized, worst case an attacker could figure out how to exploit the UB to leak data.

jethrogb commented 5 years ago

the behavior of executing an instruction that's not supported by the CPU is undefined

Uh, no. It's extremely clearly defined. Go read the Intel SDM. Most will generate a #UD exception, except for some that are encoded in the NOP space. (Edit: I suppose for the latter category you will always want to check if those actually work before you use them, even if userspace tells you they do)

gnzlbg commented 5 years ago

First, you are assuming that the instruction decoders (one of the most buggy parts of Intel CPUs) of all CPUs that were designed and implemented before some new instruction was even imagined are able to detect that new instruction that did not exist back then and that the hardware was never tested against as an illegal instruction, and properly raise an #UD exception. Due to hardware bugs, this is often not the case.

Second, as you point out, some new instructions are nops' in older CPUs often used for aligning code. This means that instead of doing what the instruction was supposed to do, your program execution will just continue right after, without raising anything. That's one of the possible outcomes of undefined behavior.

Third, Rust defines the behavior of reaching an instruction that's not supported by the CPU as undefined, assumes that this does not happen, and performs optimizations like target-feature propagation under this assumption. This means that we will do code generation on all paths that unconditionally reach that instruction under the assumption that the CPU supports it. This is worse than you think, because there are no guarantees that the actual intrinsic that you called will even be reached at all. E.g. we might insert padding / nops to align code that do nothing on CPUs where that feature is available, but trigger some other exception or jump somewhere else on CPUs without it.

jethrogb commented 5 years ago

First, you are assuming that the instruction decoders (one of the most buggy parts of Intel CPUs) of all CPUs that were designed and implemented before some new instruction was even imagined are able to detect that new instruction that did not exist back then and that the hardware was never tested against as an illegal instruction, and properly raise an #UD exception. Due to hardware bugs, this is often not the case.

Well yeah, you need to assume the hardware is functioning correctly to get anything out of SGX. I'll bring up the question of SGX's interaction with buggy decoders with Intel. But the fact that Intel keeps releasing new instruction set extensions that encode in the NOP space (MPX, CET) suggests to me they are pretty confident in their decoders on older processors.

Second, as you point out, some new instructions are nops' in older CPUs often used for aligning code.

I'm assuming the encodings chosen for the instructions in the NOP space were selected so that popular compilers don't emit these by accident, otherwise this would break binary compatibility.

Third, Rust defines the behavior of reaching an instruction that's not supported by the CPU as undefined, assumes that this does not happen, and performs optimizations like target-feature propagation under this assumption. This means that we will do code generation on all paths that unconditionally reach that instruction under the assumption that the CPU supports it. This is worse than you think, because there are no guarantees that the actual intrinsic that you called will even be reached at all.

This point seems moot to me since we're talking about behavior of compiled programs, not of the compiler. Because of the feature detection mechanisms we know the compiler has generated code that will work on processors both with and without the feature. Because the code will work on processors that have the feature, we are guaranteed "no surprises" when we run it on a processor that doesn't.

N.B. I can't find any evidence in the implementation of is_x86_feature_detected that we tell the compiler to not make the assumption you mention. So either that's broken, or the compiler doesn't actually work the way you say it does (or I missed some annotation in the code somewhere).

E.g. we might insert padding / nops to align code that do nothing on CPUs where that feature is available, but trigger some other exception or jump somewhere else on CPUs without it.

This specific example should never happen because of binary compatibility between different processor models, see also my response to your second point above.

gnzlbg commented 5 years ago

'll bring up the question of SGX's interaction with buggy decoders with Intel.

Just keep in mind that Intel is not the only vendor of x86 hardware, but yeah if you reach anybody at Intel, ask them how reliable are instruction decoders of the x86 family for dealing with unsupported encodings.

I'm assuming the encodings chosen for the instructions in the NOP space were selected so that popular compilers don't emit these by accident, otherwise this would break binary compatibility.

Binaries compiled for older targets are compatible with newer targets, but binaries compiled for newer targets are not necessarily compatible with older targets (e.g. a AVX2 binary won't work on a host without AVX2).

For padding/alignment, the compiler can choose whatever nop encoding it wants as long as it works for the target and compatibility guarantees that it will work for newer targets but LLVM at least does not guarantee that it will also work for older targets, and that code being reached on an older target is undefined behavior in LLVM, so LLVM can and will assume that this never happens. For padding, compilers are not restricted to nops (e.g. using byte traps like int3 is often done).

Because of the feature detection mechanisms we know the compiler has generated code that will work on processors both with and without the feature.

fn foo() { 
     if is_x86_feature_detected("avx2") {
         avx2_intrinsic();
     } else {
         sse42_intrinsic();
     }
    // .. do stuff ..
}
fn bar() { /* portable stuff that does not diverge */ foo() }

If when calling bar either avx2_intrinsic or sse42_intrinsic are unconditionally reached, the compiler can assume that bar is only called on hardware that support SSE4.2 and can use that information to generate code inside bar.

That is, on a CPU without SSE4.2, calling bar is UB, and the user cannot rely on sse42_intrinsic raising an exception, or even being reached at all, because things might fail well before that happens.

I can't find any evidence in the implementation of is_x86_feature_detected that we tell the compiler to not make the assumption you mention.

This has nothing to do with is_x86_feature_detected!. The intrinsics have the #[target_feature] attribute, which generates LLVM to make LLVM assume that a particular function will only be called on hardware supporting the feature. LLVM then uses that information to generate code, and is allowed to propagate them as long as doing so is sound.

jethrogb commented 5 years ago

There's “try executing instructions and see if they fail”. I'm not sure if anyone is seriously considering this. Userspace can't lie about anything this way. Userspace may DoS the enclave when running the detection routine, but that seems better than DoSing later when the enclave already thinks it can do something.

I think this is what https://github.com/intel/sgx-cpu-feature-detection does

fortanix / rust-sgx

Improve std::is_x86_feature_detected in SGX #26