apache / incubator-teaclave-sgx-sdk

Apache Teaclave (incubating) SGX SDK helps developers to write Intel SGX applications in the Rust programming language, and also known as Rust SGX SDK.
https://teaclave.apache.org
Apache License 2.0
1.17k stars 262 forks source link

Clarification on how `ring-sgx` is handling feature detection #150

Closed akhilles closed 5 years ago

akhilles commented 5 years ago

Could you please provide some more information on how AES-ni detection works in the mesalock ring-sgx fork (https://github.com/mesalock-linux/ring-sgx)? Also, isn't the upstream ring crate compatible with sgx? It's no_std by default now.

dingelish commented 5 years ago

My observation is that CPU which supports SGX always support AES-NI.

Ring's author does not guarantee the compatibility between Ring and SGX. Because neither he or Fortanix is doing comprehensive compatibility tests.

However, our ported version does guarantee compatibility, on Ubuntu {18.04, 16.04} x {HW, SIM} x {cargo, xargo} combinations. We run all those tests every day.

WX20190731-194728@2x
dingelish commented 5 years ago

You can see the discussion here for SGX support in ring.

dingelish commented 5 years ago

image

dingelish commented 5 years ago

[no_std] seems pretty but suffers from the dependency hell. If any crate on the dependency tree does not support no_std or does not enable no_std by default, the entire dependency tree will require std.

I have ported >100 crates from the community. I can tell you that no_std is widely misused and you can hardly find real no_std crates.

dingelish commented 5 years ago

Whether or not a crate is compatible with SGX can not be answered without manually porting. For example, heapsize supports no_std. But crates depends on the std mode of heapsize. std mode provides API to calculate the size of a heap object from its metadata. This requires some certain functions which SGX mem allocator cannot provide. Ported version is at heapsize-sgx.

Almost all of the high-performance algorithm crates depends on CPU feature detection and choose the most "latest" one. However, this is really bad because the performance of "FMA", "AVX" is significantly slower than traditional loops on un-aligned data. For better code quality, the ported SGX version has additional features to select from instruction sets. matrixmultiply-sgx is an example at here.

You can try the machinelearning code sample with "avx" features enabled. Then you can feel the 500% performance slow down!

Optimization is not a trivial work. The 800-page optimization reference tells how to correctly use avx/fma/sse instructions. However, we cannot depends on the community for this.