apache / incubator-teaclave-trustzone-sdk

Teaclave TrustZone SDK enables safe, functional, and ergonomic development of trustlets.
https://teaclave.apache.org
Apache License 2.0
204 stars 58 forks source link

Performance issues #89

Open syedelec opened 1 year ago

syedelec commented 1 year ago

Hello

I noticed a quite performance difference between TA written in Rust using the SDK and TA written in C

This can be easily reproducible using the simple random example. In the example, a simple 16 bytes array is generated.

root@stm32mp1-board:~# time random-rs
Invoking TA to generate random UUID...
Invoking done!
Generate random UUID: 7db2031f-a7d1-6294-5ebb33c08f88101f
Success
real    0m 1.01s
user    0m 0.00s
sys 0m 0.99s
root@stm32mp1-board:~# 
root@stm32mp1-board:~# time optee_example_random 
Invoking TA to generate random UUID... 
TA generated UUID value = 0x76ed50d34af98d4b0b089e1921cad
real    0m 0.71s
user    0m 0.00s
sys 0m 0.70s

I tried with a normal world app that does the following:

The C TA performed the above in ~1.5sec and the Rust TA in ~6sec I also tested to write the same normal world app in Rust and C but it gave the same results.

It has been tested on a stm32mp157c-dk2 board using OP-TEE OS 3.16.0

Let me know if you have an idea on the root issue. Thanks

DemesneGH commented 1 year ago

hi @syedelec I made the performance comparison between C TAs and Rust TAs, on random and aes:

# time ./random-rs
Invoking TA to generate random UUID...
Invoking done!
Generate random UUID: 918585d0-6be4-4d7-e09dcb5e4387c79b
Success
real    0m 0.46s
user    0m 0.07s
sys 0m 0.24s

# time optee_example_random
Invoking TA to generate random UUID...
TA generated UUID value = 0xad36b1d0f8134f8ae2cb4a14bf11813d
real    0m 0.34s
user    0m 0.06s
sys 0m 0.19s
# time ./aes-rs
Prepare encode operation
Load key in TA
Reset ciphering operation in TA (provides the initial vector)
Encode buffer from TA
Prepare decode operation
Load key in TA
Reset ciphering operation in TA (provides the initial vector)
Decode buffer from TA
Clear text and decoded text match
real    0m 0.56s
user    0m 0.11s
sys 0m 0.28s

# time optee_example_aes
Prepare session with the TA
Prepare encode operation
Load key in TA
Reset ciphering operation in TA (provides the initial vector)
Encode buffer from TA
Prepare decode operation
Load key in TA
Reset ciphering operation in TA (provides the initial vector)
Decode buffer from TA
Clear text and decoded text match
real    0m 0.39s
user    0m 0.07s
sys 0m 0.21s

Yes, Rust TAs has lower performance (about 35%) than C TAs on my environment (QEMUv8 & OP-TEE 3.17.0).

For performance optimization you can find some guidance on the Cargo documentation. Possible workarounds are using higher opt-level, disabling some runtime checks, etc.