apache / incubator-teaclave-sgx-sdk

Apache Teaclave (incubating) SGX SDK helps developers to write Intel SGX applications in the Rust programming language, and also known as Rust SGX SDK.
https://teaclave.apache.org
Apache License 2.0
1.17k stars 262 forks source link

getting backtrace from panics in the enclave #105

Open elichai opened 5 years ago

elichai commented 5 years ago

Hi, Do you have any ideas on how I can get a backtrace when I have a panic in the enclave?

Right now all I get is something like this:

thread panicked at 'Box<Any>', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.3.0/src/raw/mod.rs:1078:9
elichai commented 5 years ago

@dingelish This might be somehow related to you. https://github.com/rust-lang/hashbrown/issues/79

This stopped happening after I switched to the hashmap from sgx_tstd, any chance he is linking to the wrong Allocator?

(FYI, this can also be some sort of memory leak on my side)

dingelish commented 5 years ago

Hey @elichai , could you please provide a minimal enclave to replay this error? I think I can resolve it :-)

brenzi commented 5 years ago

@dingelish Even if you might find the cause: I see the same error. I have error handling in my enclave with .expect("Error Message") but I don't get the error message printed out. Instead, I see exactly the above.

Btw: In my case the problem was that I was accessing an unitialized value.

dingelish commented 5 years ago

Oh I understand. I remember this may be a long-lasting problem.

The most convenient way could be building with Xargo and add a "backtrace" feature for std. Then import backtrace EDL and setup the backtrace like this. You can try PrintFormat::Full for the most detailed backtrace.

I'm looking into the wrong message again :-)

elichai commented 5 years ago

@dingelish I'll try to recreate this in a smaller detached enclave

elichai commented 5 years ago

@dingelish Here: https://github.com/elichai/wasmi-enclave compile it with make DEBUG=1(my makefile is a bit different, nothing too serious) if you compile in release you just get core dumped

then run ./app/bin and you'll get the error.

dingelish commented 5 years ago

I think panic.rs and panicking.rs is not doing well with panic info, and the downcast_ref failed twice.

rust-stable branch works well.

Need more time on panicking.rs

dingelish commented 5 years ago

And I'm able to reproduce the bug. Let me see.

dingelish commented 5 years ago

@elichai

Confused. I enabled the backtrace feature by editing enclave/Cargo.toml:

-sgx_tstd = { git = "https://github.com/baidu/rust-sgx-sdk.git", rev = "v1.0.7" }
+sgx_tstd = { git = "https://github.com/baidu/rust-sgx-sdk.git", rev = "v1.0.7", feat
ures = ["backtrace"] }

Then add this to enclave/lib.rs

 extern crate sgx_tstd as std;
 extern crate sgx_types;

+use std::backtrace::{self, PrintFormat};
+
 extern crate parity_wasm;
 extern crate wasmi;

@@ -18,6 +20,9 @@ use wasmi::{ImportsBuilder, ModuleInstance, NopExternals, RuntimeValue};

 #[no_mangle]
 pub extern "C" fn ecall_main() -> sgx_status_t {
+
+    let _ = backtrace::enable_backtrace("enclave.signed.so", PrintFormat::Full);
+
     let path = get_code_path();
     println!("{:?}", path);

Then it no longer panics! I don't even have a chance to inspect on the stack unwinding info...

elichai commented 5 years ago

Honestly it makes me worry about this even more, something really weird is going on.

dingelish commented 5 years ago

https://github.com/intel/linux-sgx/blob/1248fc21c64467c2a8e8a126d5ea6c69f05d490b/sdk/tlibc/stdlib/malloc.c#L541

Intel makes the alignment 8 instead of traditional x86_64 default 16. So it breaks the logic of sgx_alloc.

Fixed in f99c9bc98c0fb700e14bed1b7fb771bff9a201c4. I can run your code now!

dingelish commented 5 years ago

looking into the wrong panic info (but correct line number) probem now.

elichai commented 5 years ago

So hashbrown crashes because it expects a 16 byte alignment and intel made it 8 in the sgx libs?

This sound like a big thing but looking at your fix I'm not sure 😅
(and I don't come from C so i'm not 100% at pointer alignments)

dingelish commented 5 years ago

yeah i think this is the problem. and i can confirm that sgx_alloc should be the only global allocator in this environment as long as it compiles. So i think everything should be fine!

dingelish commented 5 years ago

and i have to add more tests for alignment-critical crates — such as crates as hashbrown, ring, other stuffs related to sse or avx instructions. and more tests against low-level APIs.

dingelish commented 5 years ago

Fixed in 2940370a34db11e2c9b7956ffd0e3d14ee9f1cf2

elichai commented 5 years ago

@dingelish If you could elaborate on what was changed both in the backtrace (https://github.com/baidu/rust-sgx-sdk/commit/2940370a34db11e2c9b7956ffd0e3d14ee9f1cf2) and how the pub removal to MIN_ALIGN affects this I would be more than greatfull :)

elichai commented 5 years ago

And if this is a critical thing maybe it's better to update the 1.0.7 tag? although I'm not sure about that.

dingelish commented 5 years ago

Let me create a new release today with tag = v1.0.8. :-) Sorry for the waiting

elichai commented 5 years ago

No need to apologize :) Just trying to understand the issue and solution, I've learned a lot about rust internal through looking at your code and talking with you :)

Thanks!

dingelish commented 5 years ago

@elichai

rls seems getting back from failed state on 05-19, so it'll be ready on toolchain nightly-2019-05-20.

nightly-2019-05-20 is not ready to download now but may be ready in hours :-)

rls test pass test pass e7591c1ae on 2019-05-19 14:51:17