OP-TEE / optee_os

Trusted side of the TEE
Other
1.59k stars 1.07k forks source link

Warm Restart Crash w/ Dynamic Shared Memory Enabled #7091

Open nathan-menhorn opened 3 weeks ago

nathan-menhorn commented 3 weeks ago

Summary

OP-TEE OS is crashing when performing a warm reset of the processor.

Details

Build

The build is branched off of https://github.com/OP-TEE/optee_os/commit/31bb491f8e7c78794b6f9615cc4f2deec58b4ed9 and is currently undergoing a PR. https://github.com/OP-TEE/optee_os/pull/6738 However, none of the OP-TEE OS core components have been updated or customized for this port. Only hardware drivers to support the Versal Net device have been added. The customer using this port is disabling shared static memory and using dynamic memory via:

CFG_CORE_DYN_SHM=y CFG_CORE_RESERVED_SHM=n

Issue Specifics

Issue Log

E/TC:00 assertion '!core_mmu_user_va_range_is_defined()' failed at /usr/src/debug/optee-os-versal/3.21.0+gitAUTOINC+10f1029e8c-r0/core/mm/core_mmu.c:1124 E/TC:00 Panic at /usr/src/debug/optee-os-versal/3.21.0+gitAUTOINC+10f1029e8c-r0/core/kernel/assert.c:28 <_assert_break> E/TC:00 TEE load address @ 0x22200000 E/TC:00 Call stack: aarch64-linux-gnu-addr2line: DWARF error: can't find .debug_ranges section. E/TC:00 0x2220b3b8 print_kernel_stack at ??:? E/TC:00 0x22235408 __do_panic at ??:? E/TC:00 0x22231874 _assert_break at ??:? E/TC:00 0x22247094 _assert_trap at core_mmu.c:? E/TC:00 0x22249cdc assign_mem_va at core_mmu.c:? E/TC:00 0x22249e9c init_mem_map at core_mmu.c:? E/TC:00 0x2224a28c core_init_mmu_map at ??:?

Notes/Comments

It appears that OP-TEE OS is trying to reassign memory to an already allocated translation table here: https://github.com/OP-TEE/optee_os/blob/8f645256efc0dc66bd5c118778b0b50c44469ae1/core/mm/core_mmu.c#L1331-L1339

After debugging this issue for quite some time I cannot find anything specific in the Versal Net variant port that would be causing this issue but it might be a potential bug caused by the usage of dynamic memory in combination of a warm restart where OP-TEE is trying to reassign memory that has already been assigned. Any help trying to resolve this issue would be greatly appreciated.

jenswi-linaro commented 3 weeks ago

To avoid misunderstandings, what do you expect from a warm boot in OP-TEE? Initialize or resume? I can get the line numbers in the report to add up, so I'm missing some changes. Anyway, the assert is in assign_mem_va(). You're using CFG_WITH_LPAE=y, and OP-TEE is initializing translation tables. user_va_idx should at this point be -1, which is why the assert triggers.

nathan-menhorn commented 2 weeks ago

Hi @jenswi-linaro for a warm reboot either is acceptable. Is there a configuration or this? What is the default for OP-TEE? Thanks. Let me get my line numbers aligned as I used master instead of the branched version.

nathan-menhorn commented 2 weeks ago

https://github.com/ProvenRun/optee_os/blob/versal_net_port/core/mm/core_mmu.c#L1120-L1152

jenswi-linaro commented 2 weeks ago

If it's resume, I expect that BL31 will restore everything before calling OP-TEE via vector_cpu_resume_entry in the previously returned vector. Entering OP-TEE via the cold boot entry point requires OP-TEE to be loaded again so all the writeable data sections are reinitialized.