apache / incubator-teaclave-trustzone-sdk

Teaclave TrustZone SDK enables safe, functional, and ergonomic development of trustlets.
https://teaclave.apache.org
Apache License 2.0
203 stars 58 forks source link

Fix double-free bug in optee-utee #127

Closed a21152 closed 1 month ago

a21152 commented 2 months ago

This PR fixes a double-free bug in optee-utee at https://github.com/apache/incubator-teaclave-trustzone-sdk/blob/master/optee-utee/macros/src/lib.rs#L393, where the forget call is forgotten for the error path.

Each time a TA returns an error from a command handler, the bug causes drop to be called on the shared session object. A double-free-induced panic happens when a second error invocation happens to the same session or when the session closes after a single invocation error.

This PR adds an example -- error_handling-rs that reproduces the error and also serve as a test in CI.

DemesneGH commented 2 months ago

When I run the example it fails:

# ./error_handling-rs
thread 'main' panicked at src/main.rs:32:46:        // panic at ctx.open_session()
called `Result::unwrap()` on an `Err` value: System ran out of resources. (error code 0xffff000c)

It's also the "panic" but seems not the double-free panic, and grep -q -v "panicked" screenlog.0 returns true so the test_error_handling.sh passed.

Could you check it out? thanks!

a21152 commented 1 month ago

hmm... that is odd.

0xffff000c indicates TEE_ERROR_OUT_OF_MEMORY upon open_session(), which I don't expect to fail in the test.

I rerun error-handling-rs manually in my local make run-only qemu and also tests/test_error_handling.sh, both passed with no panics.

Also grep -q -v "panicked" screenlog.0 should return non-zero for any panics, causing the test to fail (which is what I expect), because of the -v switch.

I suspect the vec! call in the TA when creating the session context was causing the ffff000c error for you but I can't figure out why.

Are you running against main branch by any chance (which may not work)?

If you can't figure out it either, I would need your exact test procedure so I can try to replicate your env. here to further triage.

DemesneGH commented 1 month ago

@a21152 I've reproduced the bug in newer OP-TEE image (4.2.0), seems the prebuilt OP-TEE 3.20.0 (the tests/ used) is not compatible with current no-std examples. I'll update the test scripts and pre-build images further.

The fix https://github.com/apache/incubator-teaclave-trustzone-sdk/pull/127/commits/a2fb3391d134aac8c0013fbb692523c8d8ca2dbc looks good to me.

After fixing the bug would you prefer to keep the error_handling-rs example in our examples?

a21152 commented 1 month ago

Sounds good. Although my run of tests/test_error_handling.sh (presumably with the same 3.20 prebuilts) also worked fine.

Yes -- I'd like to keep the error_handling-rs example so we make sure this issue does not regress. It also allows us to add more tests against similar error paths.

DemesneGH commented 1 month ago

Merged, thanks!

a21152 commented 1 month ago

Thank you for your quick turn around!