System stalls during boot if CFG_PAGED_USER_TA=y and CFG_TEE_CORE_LOG_LEVEL=0

j-schacht commented 3 weeks ago

Hello!

I am currently working on setting up OP-TEE on the Xilinx ZCU102 Evaluation Kit (Zynq MPSoC platform). I am using version 3.18.0 of OP-TEE and I know that OP-TEE provides (limited) support for this platform.

In my scenario, DDR memory is not considered secure because it is assumed that an attacker has physical access to the hardware and can manipulate the DDR memory. Therefore, I want to execute OP-TEE from on-chip memory. As the ZCU102 provides only 256 KiB of directly accessible on-chip SRAM, it is necessary to activate the pager. As stated here, it should be possible to run OP-TEE on systems with only 128 to 256 KiB of memory available.

The ATF takes up 88 KiB of the on-chip memory, which means that there are only 168 KiB left for OP-TEE. Luckily, I managed to squeeze OP-TEE into the remaining space. (I have seen #5915 and I am aware that this solution is very slow. But performance does not matter for my use case and my TAs are lightweight.)

This is my configuration, which results in a working solution:

CFG_TEE_CORE_DEBUG=n
CFG_DEBUG_INFO=n
CFG_TEE_CORE_LOG_LEVEL=0
CFG_UART_BASE=UART1_BASE
CFG_UART_CLK_HZ=UART1_CLK_IN_HZ
CFG_UART_IT=IT_UART1
CFG_CORE_LARGE_PHYS_ADDR=y
CFG_WITH_LPAE=y
CFG_LPAE_ADDR_SPACE_BITS=36
CFG_WITH_PAGER=y
CFG_PAGED_USER_TA=n
CFG_TEE_LOAD_ADDR=0xFFFC0000
CFG_TZSRAM_START=0xFFFC0000
CFG_TZSRAM_SIZE=0x0002A000
CFG_PAGEABLE_ADDR=0x60000000
CFG_TZDRAM_START=0x60000000
CFG_TZDRAM_SIZE=0x01000000
CFG_SHMEM_START=0x70000000
CFG_SHMEM_SIZE=0x01000000
CFG_CORE_RWDATA_NOEXEC=n
CFG_UNWIND=n
CFG_CORE_HEAP_SIZE=32768
CFG_CORE_BGET_BESTFIT=y
CFG_CC_OPT_LEVEL=s
CFG_TA_FLOAT_SUPPORT=n
CFG_EARLY_TA_COMPRESS=n
CFG_CORE_UNMAP_CORE_AT_EL0=n
CFG_NUM_THREADS=1

With this, I can successfully run the example TAs. Now, I want to set CFG_PAGED_USER_TA=y to enable paging of the TAs. However, with CFG_TEE_CORE_LOG_LEVEL=0, the Linux kernel stalls during the boot process:

...
[    1.524063] zynqmp_firmware_probe Platform Management API v1.1
[    1.529871] zynqmp_firmware_probe Trustzone version v1.0
[    1.565265] securefw securefw: securefw probed
[    1.565632] zynqmp-aes zynqmp-aes.0: will run requests pump with realtime priority
[    1.572240] usbcore: registered new interface driver usbhid
[    1.577310] usbhid: USB HID core driver
[    1.583975] ARM CCI_400_r1 PMU driver probed
[    1.584685] fpga_manager fpga0: Xilinx ZynqMP FPGA Manager registered
[    1.592172] optee: probing for conduit method.
<---- SYSTEM STALLS HERE ---->

With any higher log level, the system works just fine. (Because there would not be enough SRAM space for compiling with log level > 0, I moved the ATF to DDR RAM to be able to test with different log levels.)

When working correctly, the Linux kernel outputs the following:

 ...
[    1.577310] usbhid: USB HID core driver
[    1.583975] ARM CCI_400_r1 PMU driver probed
[    1.584685] fpga_manager fpga0: Xilinx ZynqMP FPGA Manager registered
[    1.592172] optee: probing for conduit method.
[    1.596424] optee: dynamic shared memory is enabled
[    1.605654] optee: initialized driver
[    1.609207] usbcore: registered new interface driver snd-usb-audio
[    1.616019] pktgen: Packet Generator for packet performance testing. Version: 2.75
[    1.623524] Initializing XFRM netlink socket
...

Unfortunately, I cannot provide a log from the OP-TEE side, because the error does not occur when logs are enabled.

Does anyone have an idea why CFG_PAGED_USER_TA=y only works with log levels higher than zero?

Any helpful ideas would be greatly appreciated. I have spent many hours on this problem and am currently stuck. Thanks in advance!

jenswi-linaro commented 3 weeks ago

The kernel buffers output so the may be a few more calls into OP-TEE than what's seen in the log before the stalling. Anyway, this is quite hard to debug without any secure world prints. Can you connect a JTAG debugger?

Another option is to try to reproduce the problem on QEMU (don't forget CFG_CORE_ASLR=n) and debug it with GDB.

j-schacht commented 3 weeks ago

Hi! Thanks for helping :)

I managed to find out where the system is getting stuck on the Linux kernel side. My kernel version is 6.1 and the source code of the corresponding OP-TEE driver can be found here: https://github.com/torvalds/linux/blob/v6.1/drivers/tee/optee/

In smc_abi.c, the following functions are called during boot:

optee_probe(...)

  --> optee_msg_api_uid_is_optee_api(...)

        --> optee_smccc_smc(OPTEE_SMC_CALLS_UID, 0, 0, 0, 0, 0, 0, 0, &res)

              --> arm_smccc_smc(OPTEE_SMC_CALLS_UID, 0, 0, 0, 0, 0, 0, 0, res)

However, arm_smccc_smc() never returns.

Can you tell me, where I can find the entry point for SMCs in OP-TEE? Maybe I can find out where it is getting stuck on the OP-TEE side.

I will try to find out more via the JTAG debugger. But I have to rebuild the boot image first to turn off secure boot and enable JTAG.

UPDATE: I also checked what happens in the ATF. It receives the SMC (with ID 0xbf00ff01), which is invoked as described above. Then, it dispatches it to OP-TEE, but OP-TEE never returns.

j-schacht commented 3 weeks ago

I finally found out what causes the problem:

The entry points for SMCs can be found in thread_optee_smc_a64.S (vector_fast_smc_entry in this case). From there, thread_handle_fast_smc() in thread_optee_smc.c is called. OP-TEE receives the SMC correctly.

However, the thread_check_canaries() function detects a dead canary and calls panic():

Dead canary at start of 'stack_abt[0]' (0xfffd5100)

But with log level set to zero, this message is not visible, of course.

How is this stack overflow related to the log level?

Stack sizes are defined in thread_private_arch.h. For ARM64, the size of stack_abt is defined depending on the log level:

#if TRACE_LEVEL > 0
#define STACK_ABT_SIZE      3072
#else
#define STACK_ABT_SIZE      1024
#endif

If the log level is set to zero, this stack is much smaller. I assume that with CFG_PAGED_USER_TA=y the stack usage is higher. This eventually leads to a stack overflow and a crash when combining these settings.

How can this be fixed?

I created a patch for _thread_privatearch.h, allowing me to define the size of stack_abt through a configuration parameter. With CFG_STACK_ABT_SIZE=3072 it works well.

--- a/core/arch/arm/include/kernel/thread_private_arch.h
+++ b/core/arch/arm/include/kernel/thread_private_arch.h
@@ -44,11 +44,15 @@
 #endif
 #define STACK_THREAD_SIZE  8192

+#ifdef CFG_STACK_ABT_SIZE
+#define STACK_ABT_SIZE     (CFG_STACK_ABT_SIZE)
+#else /* CFG_STACK_ABT_SIZE */
 #if TRACE_LEVEL > 0
 #define STACK_ABT_SIZE     3072
 #else
 #define STACK_ABT_SIZE     1024
 #endif
+#endif /* CFG_STACK_ABT_SIZE */
 #endif /*ARM64*/

 #ifdef CFG_CORE_DEBUG_CHECK_STACKS

Final questions:

Before I close this issue, I still have some questions:

What is the purpose of the ABT stack? This does not seem to be documented.
Why is the size of the ABT stack defined depending on the log level for ARM64? (This is not the case for ARM32).
Is it expected behaviour that TA paging requires a larger ABT stack?
Is it a good solution to simply increase stack size or could the actual problem lie elsewhere?
Is this problem specific to my system configuration or can it be reproduced on other platforms? In the latter case, a permanent fix would make sense.

jenswi-linaro commented 2 weeks ago

What is the purpose of the ABT stack? This does not seem to be documented.

The Abort stack is used to handle page faults when paging is enabled

Why is the size of the ABT stack defined depending on the log level for ARM64? (This is not the case for ARM32).

We're trying to keep this as small as possible, printing with ARM64 consumes a lot of stack.

Is it expected behaviour that TA paging requires a larger ABT stack?

No, it should be more or less the same as for Core paging.

Is it a good solution to simply increase stack size or could the actual problem lie elsewhere?

Yes. Would you mind creating a PR to update the needed size upstream?

Is this problem specific to my system configuration or can it be reproduced on other platforms? In the latter case, a permanent fix would make sense.

Yes, this is a generic problem.

j-schacht commented 2 weeks ago

Hi! Thanks for taking time to answer my questions.

Is it expected behaviour that TA paging requires a larger ABT stack?

No, it should be more or less the same as for Core paging.

Interesting... If that's the case I don't understand why the error occurs in the first place. Maybe it just needs slightly more stack but the stack size has been 'optimized' too much? Anyway, since this is a working solution, I will experiment with different stack sizes and then create a PR.

Best regards!

j-schacht commented 2 weeks ago

Indeed, this issue has already been fixed three months ago (see commit 23f867d, which also includes a good explanation). Because no issue has been created on this, I wasn't able to find this fix via GitHub or Google search. Since this bug is fixed I will not create a PR.

However, I found out that OP-TEE would work with an ABT stack smaller than 3072 bytes. In my case, 1536 bytes are sufficient. This could be relevant if one needs to reduce the memory footprint.

OP-TEE / optee_os

System stalls during boot if CFG_PAGED_USER_TA=y and CFG_TEE_CORE_LOG_LEVEL=0 #6878