OP-TEE / optee_os

Trusted side of the TEE
Other
1.57k stars 1.06k forks source link

Weird secure monitor issue with mixed fast smc and std smc calls #5254

Closed vesajaaskelainen closed 1 year ago

vesajaaskelainen commented 2 years ago

I have been working on refurbishing Xilinx Zynq-7000 SoC (Cortex-A9 x 2) support in OP-TEE to make it possible to boot u-boot and Linux kernel. OP-TEE OS is from tag 3.16.0 + changes.

With the work I have started using SCMI support that @etienne-lms has been introducing in different components.

I have implemented clock driver for zynq7k and several fast smc calls for quick path to query certain hardware details and reworked zynq7k support altogether.

System works so that U-Boot's SPL loads OP-TEE and u-boot proper and then SPL boots into OP-TEE and then when it is done then starts executing u-boot proper and then u-boot's TEE queries with fast smc calls features supported and then switches to SCMI support to fetch information about what clocks are there with std smc calls and to SCMI server and then u-boot gets relocated and then it goes to ask again what features are supported but that fails with:

E/TC:0   assertion 'thread_get_exceptions() == THREAD_EXCP_ALL' failed at core/arch/arm/kernel/thread_optee_smc.c:47 <thread_handle_fast_smc>
E/TC:0   Panic at core/kernel/assert.c:28 <_assert_break>

I have been looking with the Trace32 what could be the issue and currently it looks like that effect starts from sm_from_nsec():

https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm.c#L93

and then later in:

https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm_a32.S#L196

actually causes FIQ bit to be unmasked.

and then actual operation is performed but thread exception check happens and causes panic.

Seems that contents of struct sm_ctx *ctx is wrong.

I am a bit puzzled what would be the right course of action as there seems to be quite a bit in play in here.

While stepping in secure monitor I can see that in some cases the FIQ bit is not set and my understanding is that if there would be IRQ handler for secure peripheral then FIQ would be used to signal it.

Any ideas what to try or look?

I have added some debug prints to figure out what kind of SMC calls we have.

U-Boot SPL 2022.01 (Feb 14 2022 - 16:26:27 +0000)
Trying to boot from RAM
## Checking hash(es) for config conf ... OK
## Checking hash(es) for Image optee ... sha256+ OK
## Checking hash(es) for Image uboot ... sha256+ OK
## Checking hash(es) for Image fdt ... sha256+ OK
D/TC:0   get_aslr_seed:1419 Cannot find /secure-chosen
D/TC:0   plat_get_aslr_seed:110 Warning: no ASLR seed
D/TC:0   add_phys_mem:571 TEE_SHMEM_START type NSEC_SHM 0x3bf00000 size 0x00100000
D/TC:0   add_phys_mem:571 TA_RAM_START type TA_RAM 0x3c100000 size 0x03f00000
D/TC:0   add_phys_mem:571 VCORE_UNPG_RW_PA type TEE_RAM_RW 0x3c056000 size 0x000aa000
D/TC:0   add_phys_mem:571 VCORE_UNPG_RX_PA type TEE_RAM_RX 0x3c000000 size 0x00056000
D/TC:0   add_phys_mem:571 ROUNDDOWN(0xF8007000, CORE_MMU_PGDIR_SIZE) type IO_SEC 0xf8000000 size 0x00200000
D/TC:0   add_phys_mem:571 ROUNDDOWN(0xF8006000, CORE_MMU_PGDIR_SIZE) type IO_SEC 0xf8000000 size 0x00200000
D/TC:0   add_phys_mem:585 Physical mem map overlaps 0xf8000000
D/TC:0   add_phys_mem:571 ROUNDDOWN(0xF8000000, CORE_MMU_PGDIR_SIZE) type IO_SEC 0xf8000000 size 0x00100000
D/TC:0   add_phys_mem:585 Physical mem map overlaps 0xf8000000
D/TC:0   add_phys_mem:571 ROUNDDOWN(0xF8F02000, CORE_MMU_PGDIR_SIZE) type IO_SEC 0xf8f00000 size 0x00200000
D/TC:0   add_phys_mem:571 ROUNDDOWN(0xF8F00000, CORE_MMU_PGDIR_SIZE) type IO_SEC 0xf8f00000 size 0x00100000
D/TC:0   add_phys_mem:585 Physical mem map overlaps 0xf8f00000
D/TC:0   add_phys_mem:571 ROUNDDOWN(0xE0001000, CORE_MMU_PGDIR_SIZE) type IO_NSEC 0xe0000000 size 0x00200000
D/TC:0   add_phys_mem:571 CFG_SCMI_SHM_BASE type IO_NSEC 0xfffff000 size 0x00001000
D/TC:0   add_va_space:611 type RES_VASPACE size 0x00a00000
D/TC:0   add_va_space:611 type SHM_VASPACE size 0x02000000
D/TC:0   dump_mmap_table:724 type TEE_RAM_RX   va 0x3c000000..0x3c055fff pa 0x3c000000..0x3c055fff size 0x00056000 (smallpg)
D/TC:0   dump_mmap_table:724 type TEE_RAM_RW   va 0x3c056000..0x3c0fffff pa 0x3c056000..0x3c0fffff size 0x000aa000 (smallpg)
D/TC:0   dump_mmap_table:724 type IO_NSEC      va 0x3c100000..0x3c100fff pa 0xfffff000..0xffffffff size 0x00001000 (smallpg)
D/TC:0   dump_mmap_table:724 type RES_VASPACE  va 0x3c200000..0x3cbfffff pa 0x00000000..0x009fffff size 0x00a00000 (pgdir)
D/TC:0   dump_mmap_table:724 type SHM_VASPACE  va 0x3cc00000..0x3ebfffff pa 0x00000000..0x01ffffff size 0x02000000 (pgdir)
D/TC:0   dump_mmap_table:724 type NSEC_SHM     va 0x3ec00000..0x3ecfffff pa 0x3bf00000..0x3bffffff size 0x00100000 (pgdir)
D/TC:0   dump_mmap_table:724 type TA_RAM       va 0x3ed00000..0x42bfffff pa 0x3c100000..0x3fffffff size 0x03f00000 (pgdir)
D/TC:0   dump_mmap_table:724 type IO_NSEC      va 0x42c00000..0x42dfffff pa 0xe0000000..0xe01fffff size 0x00200000 (pgdir)
D/TC:0   dump_mmap_table:724 type IO_SEC       va 0x42e00000..0x42ffffff pa 0xf8000000..0xf81fffff size 0x00200000 (pgdir)
D/TC:0   dump_mmap_table:724 type IO_SEC       va 0x43000000..0x431fffff pa 0xf8f00000..0xf90fffff size 0x00200000 (pgdir)
D/TC:0   core_mmu_alloc_l2:272 L2 table used: 1/6
D/TC:0   core_mmu_alloc_l2:272 L2 table used: 2/6
I/TC:
I/TC: Non-secure external DT found
D/TC:0 0 add_optee_dt_node:710 OP-TEE Device Tree node already exists!
I/TC: OP-TEE version: 3.16.0-54-g56a63854b1-dev (gcc version 9.3.0 (GCC)) #1 Thu 07 Apr 2022 05:08:38 AM UTC arm
I/TC: WARNING: This OP-TEE configuration might be insecure!
I/TC: WARNING: Please check https://optee.readthedocs.io/en/latest/architecture/porting_guidelines.html
I/TC: Primary CPU initializing
D/TC:0 0 boot_init_primary_late:1293 Executing at offset 0 with virtual load address 0x3c000000
D/TC:0 0 gic_dump_state:410 GICC_CTLR: 0xb
D/TC:0 0 gic_dump_state:412 GICD_CTLR: 0x3
D/TC:0 0 gic_dump_state:416 irq0: enabled, group:1, target:1
D/TC:0 0 gic_dump_state:416 irq1: enabled, group:1, target:1
D/TC:0 0 gic_dump_state:416 irq2: enabled, group:1, target:1
D/TC:0 0 gic_dump_state:416 irq3: enabled, group:1, target:1
D/TC:0 0 gic_dump_state:416 irq4: enabled, group:1, target:1
D/TC:0 0 gic_dump_state:416 irq5: enabled, group:1, target:1
D/TC:0 0 gic_dump_state:416 irq6: enabled, group:1, target:1
D/TC:0 0 gic_dump_state:416 irq7: enabled, group:1, target:1
D/TC:0 0 gic_dump_state:416 irq8: enabled, group:0, target:1
D/TC:0 0 gic_dump_state:416 irq9: enabled, group:0, target:1
D/TC:0 0 gic_dump_state:416 irq10: enabled, group:0, target:1
D/TC:0 0 gic_dump_state:416 irq11: enabled, group:0, target:1
D/TC:0 0 gic_dump_state:416 irq12: enabled, group:0, target:1
D/TC:0 0 gic_dump_state:416 irq13: enabled, group:0, target:1
D/TC:0 0 gic_dump_state:416 irq14: enabled, group:0, target:1
D/TC:0 0 gic_dump_state:416 irq15: enabled, group:0, target:1
D/TC:0 0 call_preinitcalls:21 level 2 mobj_mapped_shm_init()
D/TC:0 0 mobj_mapped_shm_init:463 Shared memory address range: 3cc00000, 3ec00000
D/TC:0 0 call_initcalls:40 level 1 register_time_source()
D/TC:0 0 call_initcalls:40 level 1 teecore_init_pub_ram()
D/TC:0 0 call_initcalls:40 level 1 clk_dt_probe()
D/TC:0 0 clk_dt_probe:205 Probing clocks from devicetree
D/TC:0 0 call_initcalls:40 level 2 probe_dt_drivers_early()
D/TC:0 0 add_node_to_probe:540 element: axi on node simple-bus
D/TC:0 0 print_probe_list:299 Probe list: 1 elements
D/TC:0 0 print_probe_list:301 |- Driver simple-bus probes on node axi
D/TC:0 0 print_probe_list:305 `- Probe list end
D/TC:0 0 print_probe_list:311 Failed list: 0 elements
D/TC:0 0 print_probe_list:316 `- Failed list end
F/TC:0 0 process_probe_list:449 Probe loop 1 after 0 for deferral(s)
D/TC:0 0 print_probe_list:299 Probe list: 1 elements
D/TC:0 0 print_probe_list:301 |- Driver simple-bus probes on node axi
D/TC:0 0 print_probe_list:305 `- Probe list end
D/TC:0 0 print_probe_list:311 Failed list: 0 elements
D/TC:0 0 print_probe_list:316 `- Failed list end
F/TC:0 0 probe_driver_node:338 Probing simple-bus on node axi
D/TC:0 0 probe_driver_node:345 element: simple-bus on node axi initialized
F/TC:0 0 process_probe_list:449 Probe loop 2 after 0 for deferral(s)
D/TC:0 0 print_probe_list:299 Probe list: 0 elements
D/TC:0 0 print_probe_list:305 `- Probe list end
D/TC:0 0 print_probe_list:311 Failed list: 0 elements
D/TC:0 0 print_probe_list:316 `- Failed list end
D/TC:0 0 call_initcalls:40 level 3 zynq7k_clk_early_init()
E/TC:0 0 clk_op_get_rate:582 get clock rate: 0 (retval: 1333333320)
D/TC:0 0 clk_register:104 Registered clock armpll, freq 1333333320
E/TC:0 0 clk_op_get_rate:582 get clock rate: 1 (retval: 1066666656)
D/TC:0 0 clk_register:104 Registered clock ddrpll, freq 1066666656
E/TC:0 0 clk_op_get_rate:582 get clock rate: 2 (retval: 999999990)
D/TC:0 0 clk_register:104 Registered clock iopll, freq 999999990
E/TC:0 0 clk_op_get_rate:582 get clock rate: 3 (retval: 666666660)
D/TC:0 0 clk_register:104 Registered clock cpu_6or4x, freq 666666660
E/TC:0 0 clk_op_get_rate:582 get clock rate: 4 (retval: 333333330)
D/TC:0 0 clk_register:104 Registered clock cpu_3or2x, freq 333333330
E/TC:0 0 clk_op_get_rate:582 get clock rate: 5 (retval: 222222220)
D/TC:0 0 clk_register:104 Registered clock cpu_2x, freq 222222220
E/TC:0 0 clk_op_get_rate:582 get clock rate: 6 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock cpu_1x, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 7 (retval: 355555552)
D/TC:0 0 clk_register:104 Registered clock ddr2x, freq 355555552
E/TC:0 0 clk_op_get_rate:582 get clock rate: 8 (retval: 533333328)
D/TC:0 0 clk_register:104 Registered clock ddr3x, freq 533333328
E/TC:0 0 clk_op_get_rate:582 get clock rate: 9 (retval: 10158730)
D/TC:0 0 clk_register:104 Registered clock dci, freq 10158730
E/TC:0 0 clk_op_get_rate:582 get clock rate: 10 (retval: 199999998)
D/TC:0 0 clk_register:104 Registered clock lqspi, freq 199999998
E/TC:0 0 clk_op_get_rate:582 get clock rate: 11 (retval: 22222222)
D/TC:0 0 clk_register:104 Registered clock smc, freq 22222222
E/TC:0 0 clk_op_get_rate:582 get clock rate: 12 (retval: 199999998)
D/TC:0 0 clk_register:104 Registered clock pcap, freq 199999998
E/TC:0 0 clk_op_get_rate:582 get clock rate: 13 (retval: 0)
D/TC:0 0 clk_register:104 Registered clock gem0, freq 0
E/TC:0 0 clk_op_get_rate:582 get clock rate: 14 (retval: 0)
D/TC:0 0 clk_register:104 Registered clock gem1, freq 0
E/TC:0 0 clk_op_get_rate:582 get clock rate: 15 (retval: 99999999)
D/TC:0 0 clk_register:104 Registered clock fclk0, freq 99999999
E/TC:0 0 clk_op_get_rate:582 get clock rate: 16 (retval: 249999997)
D/TC:0 0 clk_register:104 Registered clock fclk1, freq 249999997
E/TC:0 0 clk_op_get_rate:582 get clock rate: 17 (retval: 124999998)
D/TC:0 0 clk_register:104 Registered clock fclk2, freq 124999998
E/TC:0 0 clk_op_get_rate:582 get clock rate: 18 (retval: 41666666)
D/TC:0 0 clk_register:104 Registered clock fclk3, freq 41666666
E/TC:0 0 clk_op_get_rate:582 get clock rate: 19 (retval: 7999999)
D/TC:0 0 clk_register:104 Registered clock can0, freq 7999999
E/TC:0 0 clk_op_get_rate:582 get clock rate: 20 (retval: 7999999)
D/TC:0 0 clk_register:104 Registered clock can1, freq 7999999
E/TC:0 0 clk_op_get_rate:582 get clock rate: 21 (retval: 24999999)
D/TC:0 0 clk_register:104 Registered clock sdio0, freq 24999999
E/TC:0 0 clk_op_get_rate:582 get clock rate: 22 (retval: 24999999)
D/TC:0 0 clk_register:104 Registered clock sdio1, freq 24999999
E/TC:0 0 clk_op_get_rate:582 get clock rate: 23 (retval: 99999999)
D/TC:0 0 clk_register:104 Registered clock uart0, freq 99999999
E/TC:0 0 clk_op_get_rate:582 get clock rate: 24 (retval: 99999999)
D/TC:0 0 clk_register:104 Registered clock uart1, freq 99999999
E/TC:0 0 clk_op_get_rate:582 get clock rate: 25 (retval: 166666665)
D/TC:0 0 clk_register:104 Registered clock spi0, freq 166666665
E/TC:0 0 clk_op_get_rate:582 get clock rate: 26 (retval: 166666665)
D/TC:0 0 clk_register:104 Registered clock spi1, freq 166666665
E/TC:0 0 clk_op_get_rate:582 get clock rate: 27 (retval: 222222220)
D/TC:0 0 clk_register:104 Registered clock dma, freq 222222220
E/TC:0 0 clk_op_get_rate:582 get clock rate: 28 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock usb0_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 29 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock usb1_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 30 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock gem0_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 31 (retval: 0)
D/TC:0 0 clk_register:104 Registered clock gem1_aper, freq 0
E/TC:0 0 clk_op_get_rate:582 get clock rate: 32 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock sdio0_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 33 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock sdio1_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 34 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock spi0_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 35 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock spi1_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 36 (retval: 0)
D/TC:0 0 clk_register:104 Registered clock can0_aper, freq 0
E/TC:0 0 clk_op_get_rate:582 get clock rate: 37 (retval: 0)
D/TC:0 0 clk_register:104 Registered clock can1_aper, freq 0
E/TC:0 0 clk_op_get_rate:582 get clock rate: 38 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock i2c0_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 39 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock i2c1_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 40 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock uart0_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 41 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock uart1_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 42 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock gpio_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 43 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock lqspi_aper, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 44 (retval: 111111110)
D/TC:0 0 clk_register:104 Registered clock smc_aper, freq 111111110
D/TC:0 0 clk_register:104 Registered clock swdt, freq 111111110
E/TC:0 0 clk_op_get_rate:582 get clock rate: 46 (retval: 0)
D/TC:0 0 clk_register:104 Registered clock dbg_trc, freq 0
E/TC:0 0 clk_op_get_rate:582 get clock rate: 47 (retval: 0)
D/TC:0 0 clk_register:104 Registered clock dbg_apb, freq 0
D/TC:0 0 call_initcalls:40 level 3 check_ta_store()
D/TC:0 0 check_ta_store:408 TA store: "Secure Storage TA"
D/TC:0 0 check_ta_store:408 TA store: "REE"
D/TC:0 0 call_initcalls:40 level 3 verify_pseudo_tas_conformance()
D/TC:0 0 call_initcalls:40 level 3 tee_cryp_init()
E/TC:0 0 plat_rng_init:191 Warning: seeding RNG with zeroes
D/TC:0 0 call_initcalls:40 level 4 tee_fs_init_key_manager()
D/TC:0 0 call_initcalls:40 level 5 probe_dt_drivers()
F/TC:0 0 process_probe_list:449 Probe loop 3 after 0 for deferral(s)
D/TC:0 0 print_probe_list:299 Probe list: 0 elements
D/TC:0 0 print_probe_list:305 `- Probe list end
D/TC:0 0 print_probe_list:311 Failed list: 0 elements
D/TC:0 0 print_probe_list:316 `- Failed list end
D/TC:0 0 call_initcalls:40 level 6 zynq7k_init_scmi_server()
D/TC:0 0 call_initcalls:40 level 6 mobj_init()
D/TC:0 0 call_initcalls:40 level 6 default_mobj_init()
D/TC:0 0 call_initcalls:40 level 7 release_probe_lists()
D/TC:0 0 call_finalcalls:59 level 1 release_external_dt()
I/TC: Primary CPU switching to normal world boot

# Note: Here we have couple fast SMC calls
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0xbf00ff01
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0xb2000001
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0xbf00ff03
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0xb2000009
D/TC:0   tee_entry_exchange_capabilities:100 Asynchronous notifications are disabled
D/TC:0   tee_entry_exchange_capabilities:109 Dynamic shared memory is enabled

# Note: From here we have std SMC calls
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
D/TC:0 0 core_mmu_alloc_l2:272 L2 table used: 3/6
D/TC:? 0 tee_ta_init_pseudo_ta_session:296 Lookup pseudo TA a8cfe406-d4f5-4a2e-9f8d-a25dc754c099
D/TC:? 0 tee_ta_init_pseudo_ta_session:309 Open PTA-SCMI
D/TC:? 0 tee_ta_init_pseudo_ta_session:326 PTA-SCMI : a8cfe406-d4f5-4a2e-9f8d-a25dc754c099
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
F/TC:? 0 pta_scmi_invoke_command:155 SCMI command 0x3 ptypes 0x3
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:? 0 tee_ta_close_session:512 csess 0x3c075db0 id 1
D/TC:? 0 tee_ta_close_session:531 Destroy session
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
D/TC:? 0 tee_ta_init_session_with_context:607 Re-open TA a8cfe406-d4f5-4a2e-9f8d-a25dc754c099
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
F/TC:? 0 pta_scmi_invoke_command:155 SCMI command 0x3 ptypes 0x3
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
F/TC:? 0 pta_scmi_invoke_command:155 SCMI command 0x1 ptypes 0x1
D/TC:? 0 plat_scmi_clock_get_rate:333 Channel 0 scmi ID 3: clock(cpu_6or4x)->get_rate()
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:? 0 tee_ta_close_session:512 csess 0x3c075ca8 id 1
D/TC:? 0 tee_ta_close_session:531 Destroy session
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
D/TC:? 0 tee_ta_init_session_with_context:607 Re-open TA a8cfe406-d4f5-4a2e-9f8d-a25dc754c099
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
F/TC:? 0 pta_scmi_invoke_command:155 SCMI command 0x3 ptypes 0x3
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
F/TC:? 0 pta_scmi_invoke_command:155 SCMI command 0x1 ptypes 0x1
D/TC:? 0 plat_scmi_clock_get_rate:333 Channel 0 scmi ID 24: clock(uart1)->get_rate()
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:? 0 tee_ta_close_session:512 csess 0x3c075ca8 id 1
D/TC:? 0 tee_ta_close_session:531 Destroy session
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
D/TC:? 0 tee_ta_init_session_with_context:607 Re-open TA a8cfe406-d4f5-4a2e-9f8d-a25dc754c099
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
F/TC:? 0 pta_scmi_invoke_command:155 SCMI command 0x3 ptypes 0x3
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
F/TC:? 0 pta_scmi_invoke_command:155 SCMI command 0x1 ptypes 0x1
D/TC:? 0 plat_scmi_clock_set_state:364 SCMI clock 24 enable
E/TC:0 0 clk_op_enable:591 enable clock: 24
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:? 0 tee_ta_close_session:512 csess 0x3c075ca8 id 1
D/TC:? 0 tee_ta_close_session:531 Destroy session
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
D/TC:? 0 tee_ta_init_session_with_context:607 Re-open TA a8cfe406-d4f5-4a2e-9f8d-a25dc754c099
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
F/TC:? 0 pta_scmi_invoke_command:155 SCMI command 0x3 ptypes 0x3
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
F/TC:? 0 pta_scmi_invoke_command:155 SCMI command 0x1 ptypes 0x1
D/TC:? 0 plat_scmi_clock_get_rate:333 Channel 0 scmi ID 24: clock(uart1)->get_rate()
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:? 0 tee_ta_close_session:512 csess 0x3c075ca8 id 1
D/TC:? 0 tee_ta_close_session:531 Destroy session
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000004
D/TC:? 0 tee_ta_init_session_with_context:607 Re-open TA a8cfe406-d4f5-4a2e-9f8d-a25dc754c099
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x32000003

U-Boot 2022.01 (Feb 14 2022 - 16:26:27 +0000)

# Note: In here we have fast SMC calls for OEM extensions
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x83000102
D/TC:0   read_pss_idcode:321 Reading PSS_IDCODE
DRAM:  D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x83000106
D/TC:0   read_ddrc_ctrl:369 Reading DDRC_CTRL
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x83000107
D/TC:0   read_ecc_scrub:379 Reading ECC_SCRUB
ECC disabled 1 GiB

# Note: In here we have fast SMC calls for standard API (same as first fast SMC call)
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0xbf00ff01
E/TC:0   assertion 'thread_get_exceptions() == THREAD_EXCP_ALL' failed at core/arch/arm/kernel/thread_optee_smc.c:47 <thread_handle_fast_smc>
E/TC:0   Panic at core/kernel/assert.c:28 <_assert_break>
E/TC:0   TEE load address @ 0x3c000000
E/TC:0   Call stack:
E/TC:0    0x3c0048dd
E/TC:0    0x3c00dc71
E/TC:0    0x3c00c513
E/TC:0    0x3c003141
E/TC:0    0x3c0004c0
jenswi-linaro commented 2 years ago

FIQ should never be unmasked while in monitor mode. If that happens the stack can become corrupted.

vesajaaskelainen commented 2 years ago

FIQ should never be unmasked while in monitor mode. If that happens the stack can become corrupted.

Isn't this code executed in monitor mode and it is clearing FIQ: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm_a32.S#L177

Here is also code that sets FIQ: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm_a32.S#L159

My understanding is that this line does the actual mode change: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm_a32.S#L196

And this then tweaks where to continue: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm.c#L96

vesajaaskelainen commented 2 years ago

To rule out OEM extensions I faked return values to remove these:

# Note: In here we have fast SMC calls for OEM extensions
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x83000102
D/TC:0   read_pss_idcode:321 Reading PSS_IDCODE
DRAM:  D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x83000106
D/TC:0   read_ddrc_ctrl:369 Reading DDRC_CTRL
D/TC:0   sm_platform_handler:447 sm_platform_handler, a0=0x83000107
D/TC:0   read_ecc_scrub:379 Reading ECC_SCRUB
ECC disabled 1 GiB

So OEM extensions as such may not be an issue.

jenswi-linaro commented 2 years ago

SCR.FIQ doesn't mask or unmask FIQs, instead it controls where it's handled. While in secure world we'd like to take it via OP-TEE's normal exception vector. But in normal world we'd instead like to take it via the monitor vector. Masking is done with CPSR.

vesajaaskelainen commented 2 years ago

I'll did some more debugging with trace32.

My break points:

break.set sm_from_nsec\1 /Onchip /Program
break.set sm_from_nsec+0x78 /Onchip /Program
Break.Set thread_handle_fast_smc+0x4 /Onchip /Program
Break.Set thread_handle_fast_smc\out+0x8 /Onchip /Program

Those are for:

Here is the first entries to break points to get an idea what we expect to see on working case.

--- first entry to \\tee\sm\sm_from_nsec+0x4
-- registers --
N _  R0   3C085A40  R8   04033270  ^S+ ^Stack_+
Z _  R1         20  R9   39FE1E00
C C  R2          0  R10  AAAAAAAA
V _  R3          0  R11  AAAAAAAA
Q _  R4   3C085A40  R12  39FE1D30
     R5          0  R13  3C085A30
0 _  R6          0  R14  3C008620
1 _  R7          0  PC   3C0086C8
2 _  SPSR 20000093  CPSR 200001F6
3 _
4 _  USR:           FIQ:
     R8   04033270  R8          0
     R9   39FE1E00  R9          0
I I  R10  AAAAAAAA  R10         0
F F  R11  AAAAAAAA  R11         0
     R12  39FE1D30  R12         0
T T  R13  AAAAAAAA  R13  AAAAAAAA
J _  R14  AAAAAAAA  R14  AAAAAAAA
mon                 SPSR AA0AAAAA
sec
     SVC:           IRQ:
A A  R13  39FE1D20  R13  AAAAAAAA
E _  R14  040332A8  R14  AAAAAAAA
     SPSR AA0AAAAA  SPSR AA0AAAAA
0 _
1 _  UND:           ABT:
2 _  R13  AAAAAAAA  R13  AAAAAAAA
3 _  R14  AAAAAAAA  R14  AAAAAAAA
     SPSR AA0AAAAA  SPSR AA0AAAAA

     MON:
     R13  3C085A30
     R14  3C008620
     SPSR 20000093
-- ctx --
  ctx = 0x3C085A40 -> (
    sec = (
      ub_regs = (
        usr_sp = 0xE5801000,
        usr_lr = 0xFFFFFF20,
        irq_spsr = 0x80,
        irq_sp = 0x3C085A40,
        irq_lr = 0x0,
        fiq_spsr = 0x0,
        fiq_sp = 0x3C085A40,
        fiq_lr = 0x0,
        svc_spsr = 0x0,
        svc_sp = 0x3C085A40,
        svc_lr = 0x3C001E0D,
        abt_spsr = 0x0,
        abt_sp = 0x3C057968,
        abt_lr = 0x0,
        und_spsr = 0x0,
        und_sp = 0x3C057968,
        und_lr = 0x0,
        pmcr = 0x41093021),
      r0 = 0xBE000000,
      r1 = 0x0,
      r2 = 0x0,
      r3 = 0x04079C9C,
      r4 = 0x0,
      r5 = 0x04000000,
      r6 = 0x04079C9C,
      r7 = 0x0,
      mon_lr = 0x3C0002A0,
      mon_spsr = 0x600001D3),
    pad = 0xAAAAAAAA,
    nsec = (
      ub_regs = (
        usr_sp = 0xAAAAAAAA,
        usr_lr = 0xAAAAAAAA,
        irq_spsr = 0xAAAAAAAA,
        irq_sp = 0xAAAAAAAA,
        irq_lr = 0xAAAAAAAA,
        fiq_spsr = 0xAAAAAAAA,
        fiq_sp = 0xAAAAAAAA,
        fiq_lr = 0xAAAAAAAA,
        svc_spsr = 0xAAAAAAAA,
        svc_sp = 0xAAAAAAAA,
        svc_lr = 0xAAAAAAAA,
        abt_spsr = 0xAAAAAAAA,
        abt_sp = 0xAAAAAAAA,
        abt_lr = 0xAAAAAAAA,
        und_spsr = 0xAAAAAAAA,
        und_sp = 0xAAAAAAAA,
        und_lr = 0xAAAAAAAA,
        pmcr = 0xAAAAAAAA),
      r8 = 0x04033270,
      r9 = 0x39FE1E00,
      r10 = 0xAAAAAAAA,
      r11 = 0xAAAAAAAA,
      r12 = 0x39FE1D30,
      r0 = 0xBF00FF01,
      r1 = 0x0,
      r2 = 0x0,
      r3 = 0x0,
      r4 = 0x0,
      r5 = 0x0,
      r6 = 0x0,
      r7 = 0x0,
      mon_lr = 0x040006C4,
      mon_spsr = 0x20000093))

--- \\tee\sm\sm_from_nsec+0x78
-- registers --
N _  R0   3C085A88  R8   04033270  ^S+ ^Stack_+
Z Z  R1   3C085B30  R9   39FE1E00
C C  R2          0  R10  AAAAAAAA
V _  R3   3C085AA8  R11  AAAAAAAA
Q _  R4   3C085A40  R12  FFFCD2C0
     R5   3C052840  R13  3C085A30
0 _  R6          0  R14  3C00873D
1 _  R7          0  PC   3C00873C
2 _  SPSR 20000093  CPSR 600001F6
3 _
4 _  USR:           FIQ:
     R8   04033270  R8          0
     R9   39FE1E00  R9          0
I I  R10  AAAAAAAA  R10         0
F F  R11  AAAAAAAA  R11         0
     R12  FFFCD2C0  R12         0
T T  R13  E5801000  R13  3C085A40
J _  R14  FFFFFF20  R14         0
mon                 SPSR        0
sec
     SVC:           IRQ:
A A  R13  3C085A40  R13  3C085A40
E _  R14  3C001E0D  R14         0
     SPSR        0  SPSR       80
0 _
1 _  UND:           ABT:
2 _  R13  3C057968  R13  3C057968
3 _  R14         0  R14         0
     SPSR        0  SPSR        0

     MON:
     R13  3C085A30
     R14  3C00873D
     SPSR 20000093
-- ctx --
  ctx = 0x3C085A40 -> (
    sec = (
      ub_regs = (
        usr_sp = 0xE5801000,
        usr_lr = 0xFFFFFF20,
        irq_spsr = 0x80,
        irq_sp = 0x3C085A40,
        irq_lr = 0x0,
        fiq_spsr = 0x0,
        fiq_sp = 0x3C085A40,
        fiq_lr = 0x0,
        svc_spsr = 0x0,
        svc_sp = 0x3C085A40,
        svc_lr = 0x3C001E0D,
        abt_spsr = 0x0,
        abt_sp = 0x3C057968,
        abt_lr = 0x0,
        und_spsr = 0x0,
        und_sp = 0x3C057968,
        und_lr = 0x0,
        pmcr = 0x41093021),
      r0 = 0xBF00FF01,
      r1 = 0x0,
      r2 = 0x0,
      r3 = 0x0,
      r4 = 0x0,
      r5 = 0x0,
      r6 = 0x0,
      r7 = 0x0,
      mon_lr = 0x3C0002A0,
      mon_spsr = 0x600001D3),
    pad = 0xAAAAAAAA,
    nsec = (
      ub_regs = (
        usr_sp = 0xAAAAAAAA,
        usr_lr = 0xAAAAAAAA,
        irq_spsr = 0xAA0AAAAA,
        irq_sp = 0xAAAAAAAA,
        irq_lr = 0xAAAAAAAA,
        fiq_spsr = 0xAA0AAAAA,
        fiq_sp = 0xAAAAAAAA,
        fiq_lr = 0xAAAAAAAA,
        svc_spsr = 0xAA0AAAAA,
        svc_sp = 0x39FE1D20,
        svc_lr = 0x040332A8,
        abt_spsr = 0xAA0AAAAA,
        abt_sp = 0xAAAAAAAA,
        abt_lr = 0xAAAAAAAA,
        und_spsr = 0xAA0AAAAA,
        und_sp = 0xAAAAAAAA,
        und_lr = 0xAAAAAAAA,
        pmcr = 0x41093028),
      r8 = 0x04033270,
      r9 = 0x39FE1E00,
      r10 = 0xAAAAAAAA,
      r11 = 0xAAAAAAAA,
      r12 = 0x39FE1D30,
      r0 = 0xBF00FF01,
      r1 = 0x0,
      r2 = 0x0,
      r3 = 0x0,
      r4 = 0x0,
      r5 = 0x0,
      r6 = 0x0,
      r7 = 0x0,
      mon_lr = 0x040006C4,
      mon_spsr = 0x20000093))

--- first thread_handle_fast_smc+0x4
-- registers
N _  R0   3C085A20  R8   04033270  ^S+ ^Stack_+
Z Z  R1          0  R9   39FE1E00
C C  R2          0  R10  AAAAAAAA
V _  R3          0  R11  AAAAAAAA
Q _  R4   3C085A20  R12  3C000474
     R5          0  R13  3C085A18
0 _  R6          0  R14  3C000480
1 _  R7          0  PC   3C002F18
2 _  SPSR        0  CPSR 600001F3
3 _
4 _  USR:           FIQ:
     R8   04033270  R8          0
     R9   39FE1E00  R9          0
I I  R10  AAAAAAAA  R10         0
F F  R11  AAAAAAAA  R11         0
     R12  3C000474  R12         0
T T  R13  E5801000  R13  3C085A40
J _  R14  FFFFFF20  R14         0
svc                 SPSR        0
sec
     SVC:           IRQ:
A A  R13  3C085A18  R13  3C085A40
E _  R14  3C000480  R14         0
     SPSR        0  SPSR       80
0 _
1 _  UND:           ABT:
2 _  R13  3C057968  R13  3C057968
3 _  R14         0  R14         0
     SPSR        0  SPSR        0

     MON:
     R13  3C085AB0
     R14  3C00873D
     SPSR 20000093
-- args
  args = 0x3C085A20 -> (
    a0 = 0xBF00FF01,
    a1 = 0x0,
    a2 = 0x0,
    a3 = 0x0,
    a4 = 0x0,
    a5 = 0x0,
    a6 = 0x0,
    a7 = 0x0)

Now we are at the call causing the panic.

--- entry to \\tee\sm\sm_from_nsec+0x4
-- registers --
N _  R0   3C085A40  R8   3FFA5270  ^S+ ^Stack_+
Z _  R1         20  R9   3EB61EA0
C C  R2          0  R10         1
V _  R3          0  R11  04000020
Q _  R4   3C085A40  R12  3EB5E440
     R5          0  R13  3C085A30
0 _  R6          0  R14  3C008620
1 _  R7          0  PC   3C0086C8
2 _  SPSR 20000093  CPSR 200001F6
3 _
4 _  USR:           FIQ:
     R8   3FFA5270  R8          0
     R9   3EB61EA0  R9          0
I I  R10         1  R10         0
F F  R11  04000020  R11         0
     R12  3EB5E440  R12         0
T T  R13  AAAAAAAA  R13  AAAAAAAA
J _  R14  AAAAAAAA  R14  AAAAAAAA
mon                 SPSR AA0AAAAA
sec
     SVC:           IRQ:
A A  R13  3EB5E430  R13  AAAAAAAA
E _  R14  3FFA52A8  R14  AAAAAAAA
     SPSR AA0AAAAA  SPSR AA0AAAAA
0 _
1 _  UND:           ABT:
2 _  R13  AAAAAAAA  R13  AAAAAAAA
3 _  R14  AAAAAAAA  R14  AAAAAAAA
     SPSR AA0AAAAA  SPSR AA0AAAAA

     MON:
     R13  3C085A30
     R14  3C008620
     SPSR 20000093
-- ctx
  ctx = 0x3C085A40 -> (
    sec = (
      ub_regs = (
        usr_sp = 0x0,
        usr_lr = 0x0,
        irq_spsr = 0x80,
        irq_sp = 0x3C085A40,
        irq_lr = 0x0,
        fiq_spsr = 0x0,
        fiq_sp = 0x3C085A40,
        fiq_lr = 0x0,
        svc_spsr = 0x0,
        svc_sp = 0x3C085A40,
        svc_lr = 0x3C001E0D,
        abt_spsr = 0x60000113,
        abt_sp = 0x3C057968,
        abt_lr = 0x3C003314,
        und_spsr = 0x0,
        und_sp = 0x3C057968,
        und_lr = 0x0,
        pmcr = 0x41093021),
      r0 = 0xBE000005,
      r1 = 0x0,
      r2 = 0x0,
      r3 = 0x0,
      r4 = 0x0,
      r5 = 0x0,
      r6 = 0x0,
      r7 = 0x0,
      mon_lr = 0x3C0032CC,
      mon_spsr = 0x60000193),
    pad = 0xAAAAAAAA,
    nsec = (
      ub_regs = (
        usr_sp = 0xAAAAAAAA,
        usr_lr = 0xAAAAAAAA,
        irq_spsr = 0xAA0AAAAA,
        irq_sp = 0xAAAAAAAA,
        irq_lr = 0xAAAAAAAA,
        fiq_spsr = 0xAA0AAAAA,
        fiq_sp = 0xAAAAAAAA,
        fiq_lr = 0xAAAAAAAA,
        svc_spsr = 0xAA0AAAAA,
        svc_sp = 0x39FE1B68,
        svc_lr = 0x040332A8,
        abt_spsr = 0xAA0AAAAA,
        abt_sp = 0xAAAAAAAA,
        abt_lr = 0xAAAAAAAA,
        und_spsr = 0xAA0AAAAA,
        und_sp = 0xAAAAAAAA,
        und_lr = 0xAAAAAAAA,
        pmcr = 0x41093028),
      r8 = 0x3FFA5270,
      r9 = 0x3EB61EA0,
      r10 = 0x1,
      r11 = 0x04000020,
      r12 = 0x3EB5E440,
      r0 = 0xBF00FF01,
      r1 = 0x0,
      r2 = 0x0,
      r3 = 0x0,
      r4 = 0x0,
      r5 = 0x0,
      r6 = 0x0,
      r7 = 0x0,
      mon_lr = 0x3FF726C4,
      mon_spsr = 0x20000093))

--- \\tee\sm\sm_from_nsec+0x78
-- registers --
N _  R0   3C085A88  R8   3FFA5270  ^S+ ^Stack_+
Z Z  R1   3C085B30  R9   3EB61EA0
C C  R2          0  R10         1
V _  R3   3C085AA8  R11  04000020
Q _  R4   3C085A40  R12  FFFCD2C0
     R5   3C052840  R13  3C085A30
0 _  R6          0  R14  3C00873D
1 _  R7          0  PC   3C00873C
2 _  SPSR 20000093  CPSR 600001F6
3 _
4 _  USR:           FIQ:
     R8   3FFA5270  R8          0
     R9   3EB61EA0  R9          0
I I  R10         1  R10         0
F F  R11  04000020  R11         0
     R12  FFFCD2C0  R12         0
T T  R13         0  R13  3C085A40
J _  R14         0  R14         0
mon                 SPSR        0
sec
     SVC:           IRQ:
A A  R13  3C085A40  R13  3C085A40
E _  R14  3C001E0D  R14         0
     SPSR        0  SPSR       80
0 _
1 _  UND:           ABT:
2 _  R13  3C057968  R13  3C057968
3 _  R14         0  R14  3C003314
     SPSR        0  SPSR 60000113

     MON:
     R13  3C085A30
     R14  3C00873D
     SPSR 20000093
-- ctx
  ctx = 0x3C085A40 -> (
    sec = (
      ub_regs = (
        usr_sp = 0x0,
        usr_lr = 0x0,
        irq_spsr = 0x80,
        irq_sp = 0x3C085A40,
        irq_lr = 0x0,
        fiq_spsr = 0x0,
        fiq_sp = 0x3C085A40,
        fiq_lr = 0x0,
        svc_spsr = 0x0,
        svc_sp = 0x3C085A40,
        svc_lr = 0x3C001E0D,
        abt_spsr = 0x60000113,
        abt_sp = 0x3C057968,
        abt_lr = 0x3C003314,
        und_spsr = 0x0,
        und_sp = 0x3C057968,
        und_lr = 0x0,
        pmcr = 0x41093021),
      r0 = 0xBF00FF01,
      r1 = 0x0,
      r2 = 0x0,
      r3 = 0x0,
      r4 = 0x0,
      r5 = 0x0,
      r6 = 0x0,
      r7 = 0x0,
      mon_lr = 0x3C0032CC,
      mon_spsr = 0x60000193),
    pad = 0xAAAAAAAA,
    nsec = (
      ub_regs = (
        usr_sp = 0xAAAAAAAA,
        usr_lr = 0xAAAAAAAA,
        irq_spsr = 0xAA0AAAAA,
        irq_sp = 0xAAAAAAAA,
        irq_lr = 0xAAAAAAAA,
        fiq_spsr = 0xAA0AAAAA,
        fiq_sp = 0xAAAAAAAA,
        fiq_lr = 0xAAAAAAAA,
        svc_spsr = 0xAA0AAAAA,
        svc_sp = 0x3EB5E430,
        svc_lr = 0x3FFA52A8,
        abt_spsr = 0xAA0AAAAA,
        abt_sp = 0xAAAAAAAA,
        abt_lr = 0xAAAAAAAA,
        und_spsr = 0xAA0AAAAA,
        und_sp = 0xAAAAAAAA,
        und_lr = 0xAAAAAAAA,
        pmcr = 0x41093028),
      r8 = 0x3FFA5270,
      r9 = 0x3EB61EA0,
      r10 = 0x1,
      r11 = 0x04000020,
      r12 = 0x3EB5E440,
      r0 = 0xBF00FF01,
      r1 = 0x0,
      r2 = 0x0,
      r3 = 0x0,
      r4 = 0x0,
      r5 = 0x0,
      r6 = 0x0,
      r7 = 0x0,
      mon_lr = 0x3FF726C4,
      mon_spsr = 0x20000093))

--- thread_handle_fast_smc+0x4
-- registers
N _  R0   3C085A20  R8   3FFA5270  ^S+ ^Stack_+
Z Z  R1          0  R9   3EB61EA0
C C  R2          0  R10         1
V _  R3          0  R11  04000020
Q _  R4   3C085A20  R12  3C000474
     R5          0  R13  3C085A18
0 _  R6          0  R14  3C000480
1 _  R7          0  PC   3C002F18
2 _  SPSR        0  CPSR 600001B3
3 _
4 _  USR:           FIQ:
     R8   3FFA5270  R8          0
     R9   3EB61EA0  R9          0
I I  R10         1  R10         0
F _  R11  04000020  R11         0
     R12  3C000474  R12         0
T T  R13         0  R13  3C085A40
J _  R14         0  R14         0
svc                 SPSR        0
sec
     SVC:           IRQ:
A A  R13  3C085A18  R13  3C085A40
E _  R14  3C000480  R14         0
     SPSR        0  SPSR       80
0 _
1 _  UND:           ABT:
2 _  R13  3C057968  R13  3C057968
3 _  R14         0  R14  3C003314
     SPSR        0  SPSR 60000113

     MON:
     R13  3C085AB0
     R14  3C00873D
     SPSR 20000093
-- args
  args = 0x3C085A20 -> (
    a0 = 0xBF00FF01,
    a1 = 0x0,
    a2 = 0x0,
    a3 = 0x0,
    a4 = 0x0,
    a5 = 0x0,
    a6 = 0x0,
    a7 = 0x0)

We can see in last thread_handle_fast_smc+0x4 that 'F'IQ flag mask has been lost.

Can you spot something that is not normal in here?

jenswi-linaro commented 2 years ago

When exiting monitor mode (as a return from exception) into secure world mon_spsr is restored into CPSR. If I recall correctly we should always enter in SVC mode with all exceptions masked.

The monitor doesn't care about this though, instead it just restores the state (with the exception of PC) we had when the last SMC was done to exit secure world. If you break just below bne .smc_from_nsec and check for FIQ in SPSR you should be able to find out when the troublesome exit occurs.

vesajaaskelainen commented 2 years ago

If I understood correctly what you asked.

I have breakpoint now in:

First time there:

N _  R0   BE000005  R8          0  ^S+ ^Stack_+
Z Z  R1         20  R9   39FE1E00
C C  R2   E7F811E3  R10  AAAAAAAA
V _  R3   AF630002  R11  AAAAAAAA
Q _  R4   A5D5C51B  R12  3C000474
     R5          0  R13  3C085A88
0 _  R6          0  R14  3C00048C
1 _  R7          0  PC   3C0085B4
2 _  SPSR 600001D3  CPSR 600001D6
3 _
4 _  USR:           FIQ:
     R8          0  R8          0
     R9   39FE1E00  R9          0
I I  R10  AAAAAAAA  R10         0
F F  R11  AAAAAAAA  R11         0
     R12  3C000474  R12         0
T _  R13  E5801000  R13  3C085A40
J _  R14  FFFFFF20  R14         0
mon                 SPSR        0
sec
     SVC:           IRQ:
A A  R13  3C085A40  R13  3C085A40
E _  R14  3C002F27  R14         0
     SPSR        0  SPSR       80
0 _
1 _  UND:           ABT:
2 _  R13  3C057968  R13  3C057968
3 _  R14         0  R14         0
     SPSR        0  SPSR        0

     MON:
     R13  3C085A88
     R14  3C00048C
     SPSR 600001D3

Then on panic case we do not get on that break point as the call is from nsec so it jump forward there.

And the exception is caused within that flow.

If I set break in here:

The first hit:

N _  R0   BF00FF01  R8   04033270  ^S+ ^Stack_+
Z _  R1         25  R9   39FE1E00
C C  R2          0  R10  AAAAAAAA
V _  R3          0  R11  AAAAAAAA
Q _  R4          0  R12  39FE1D30
     R5          0  R13  3C085B10
0 _  R6          0  R14  040006C4
1 _  R7          0  PC   3C008600
2 _  SPSR 20000093  CPSR 200001D6
3 _
4 _  USR:           FIQ:
     R8   04033270  R8          0
     R9   39FE1E00  R9          0
I I  R10  AAAAAAAA  R10         0
F F  R11  AAAAAAAA  R11         0
     R12  39FE1D30  R12         0
T _  R13  AAAAAAAA  R13  AAAAAAAA
J _  R14  AAAAAAAA  R14  AAAAAAAA
mon                 SPSR AA0AAAAA
nsec
     SVC:           IRQ:
A A  R13  39FE1D20  R13  AAAAAAAA
E _  R14  040332A8  R14  AAAAAAAA
     SPSR AA0AAAAA  SPSR AA0AAAAA
0 _
1 _  UND:           ABT:
2 _  R13  AAAAAAAA  R13  AAAAAAAA
3 _  R14  AAAAAAAA  R14  AAAAAAAA
     SPSR AA0AAAAA  SPSR AA0AAAAA

     MON:
     R13  3C085B10
     R14  040006C4
     SPSR 20000093

And then on panic flow:

N _  R0   BF00FF01  R8   3FFA5270  ^S+ ^Stack_+
Z _  R1         25  R9   3EB61EA0
C C  R2          0  R10         1
V _  R3          0  R11  04000020
Q _  R4          0  R12  3EB5E440
     R5          0  R13  3C085B10
0 _  R6          0  R14  3FF726C4
1 _  R7          0  PC   3C008600
2 _  SPSR 20000093  CPSR 200001D6
3 _
4 _  USR:           FIQ:
     R8   3FFA5270  R8          0
     R9   3EB61EA0  R9          0
I I  R10         1  R10         0
F F  R11  04000020  R11         0
     R12  3EB5E440  R12         0
T _  R13  AAAAAAAA  R13  AAAAAAAA
J _  R14  AAAAAAAA  R14  AAAAAAAA
mon                 SPSR AA0AAAAA
nsec
     SVC:           IRQ:
A A  R13  3EB5E430  R13  AAAAAAAA
E _  R14  3FFA52A8  R14  AAAAAAAA
     SPSR AA0AAAAA  SPSR AA0AAAAA
0 _
1 _  UND:           ABT:
2 _  R13  AAAAAAAA  R13  AAAAAAAA
3 _  R14  AAAAAAAA  R14  AAAAAAAA
     SPSR AA0AAAAA  SPSR AA0AAAAA

     MON:
     R13  3C085B10
     R14  3FF726C4
     SPSR 20000093

In both cases if we look at MON SPSR 0x20000093 is in both cases and in there FIQ mask is not set which is a bit weird because if I set new breakpoint:

I can see that MON SPSR is still the same and now actually when I step over rfefd sp! I can see that FIQ mask is now lost.

Here is register dump on panic flow at entry of vector_fast_smc_entry:

N _  R0   BF00FF01  R8   3FFA5270  ^S+ ^Stack_+
Z Z  R1          0  R9   3EB61EA0
C C  R2          0  R10         1
V _  R3          0  R11  04000020
Q _  R4          0  R12  FFFCD2C0
     R5          0  R13  3C085A40
0 _  R6          0  R14  3C001E0D
1 _  R7          0  PC   3C00046C
2 _  SPSR        0  CPSR 60000193
3 _
4 _  USR:           FIQ:
     R8   3FFA5270  R8          0
     R9   3EB61EA0  R9          0
I I  R10         1  R10         0
F _  R11  04000020  R11         0
     R12  FFFCD2C0  R12         0
T _  R13         0  R13  3C085A40
J _  R14         0  R14         0
svc                 SPSR        0
sec
     SVC:           IRQ:
A A  R13  3C085A40  R13  3C085A40
E _  R14  3C001E0D  R14         0
     SPSR        0  SPSR       80
0 _
1 _  UND:           ABT:
2 _  R13  3C057968  R13  3C057968
3 _  R14         0  R14  3C003314
     SPSR        0  SPSR 60000113

     MON:
     R13  3C085AB0
     R14  3C00873D
     SPSR 20000093

And for comparison the first occurence at https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm_a32.S#L192:

N _  R0          1  R8   04033270  ^S+ ^Stack_+
Z _  R1   3C085B30  R9   39FE1E00
C C  R2          0  R10  AAAAAAAA
V _  R3          0  R11  AAAAAAAA
Q _  R4          0  R12  FFFCD2C0
     R5          0  R13  3C085A40
0 _  R6          0  R14  3C00873D
1 _  R7          0  PC   3C008628
2 _  SPSR 20000093  CPSR 200001D6
3 _
4 _  USR:           FIQ:
     R8   04033270  R8          0
     R9   39FE1E00  R9          0
I I  R10  AAAAAAAA  R10         0
F F  R11  AAAAAAAA  R11         0
     R12  FFFCD2C0  R12         0
T _  R13  E5801000  R13  3C085A40
J _  R14  FFFFFF20  R14         0
mon                 SPSR        0
sec
     SVC:           IRQ:
A A  R13  3C085A40  R13  3C085A40
E _  R14  3C001E0D  R14         0
     SPSR        0  SPSR       80
0 _
1 _  UND:           ABT:
2 _  R13  3C057968  R13  3C057968
3 _  R14         0  R14         0
     SPSR        0  SPSR        0

     MON:
     R13  3C085A40
     R14  3C00873D
     SPSR 20000093

and then after step over rfefd sp! I can see that FIQ mask is not lost.

Register dump entry of vector_fast_smc_entry:

N _  R0   BF00FF01  R8   04033270  ^S+ ^Stack_+
Z Z  R1          0  R9   39FE1E00
C C  R2          0  R10  AAAAAAAA
V _  R3          0  R11  AAAAAAAA
Q _  R4          0  R12  FFFCD2C0
     R5          0  R13  3C085A40
0 _  R6          0  R14  3C001E0D
1 _  R7          0  PC   3C00046C
2 _  SPSR        0  CPSR 600001D3
3 _
4 _  USR:           FIQ:
     R8   04033270  R8          0
     R9   39FE1E00  R9          0
I I  R10  AAAAAAAA  R10         0
F F  R11  AAAAAAAA  R11         0
     R12  FFFCD2C0  R12         0
T _  R13  E5801000  R13  3C085A40
J _  R14  FFFFFF20  R14         0
svc                 SPSR        0
sec
     SVC:           IRQ:
A A  R13  3C085A40  R13  3C085A40
E _  R14  3C001E0D  R14         0
     SPSR        0  SPSR       80
0 _
1 _  UND:           ABT:
2 _  R13  3C057968  R13  3C057968
3 _  R14         0  R14         0
     SPSR        0  SPSR        0

     MON:
     R13  3C085AB0
     R14  3C00873D
     SPSR 20000093

One difference I noticed... ABT's R14 and SPSR got changed from zero to values.

After lunch I observe more ABT register value change flows.

vesajaaskelainen commented 2 years ago

Ok, now I believe I have understanding of the problem to some degree -- what is the right solution is another thing then.

ABT was false path -- this is related to OP-TEE's thread support.

@jenswi-linaro you were close on your idea where to observe but I needed a bit of additional reading of ARM reference manuals and such to understand it a bit better.

I believe the problem is related on side-effects of "stacking exceptions" with ctx->sec.mon_lr this is saved in stack in struct sm_ctx *ctx. And going back-and-forth in different call paths in secure monitor we do get ctx->sec.mon_cpsr to a state that FIQ mask is not set.

Initially what happens during the boot is that u-boot before relocation (in my case) goes to fetch information about OP-TEE environment with fast calls. All is good and CPSR gets retained so that FIQ mask is set. During fast call FIQ must be disabled and everything is good.

Next phase in u-boot before relocation is that it goes to use SCMI support to get information about clocks. Actual SCMI calls are based on std calls which then goes for OP-TEE's thread path. But what seems to be happening here is when PTA-SCMI gets opened ctx->sec.mon_cpsr gets new value of 0x60000193 which does not have FIQ mask set. Now this CPSR value starts to get persisted and used when ever we make fast call or std call.

Now on second invocation to SCMI (or any other std call) we kinda get special treat when switching from monitor -> secure we get FIQ unmasked right away as a of new value of ctx->sec.mon_cpsr includes that. Don't know if this is actually already a problem to enable FIQ's too early. There are several SCMI calls in here continuing to use std call.

Then what happens next in u-boot is querying what Zynq-7000 CPU type it is with OEM extension as fast call. But that takes a shortcut and returns right away (SM_EXIT_TO_NON_SECURE) before going too far in execution flow and we do not get any sanity checks with that.

Then u-boot gets relocated and in driver model gets a bit confused -- this is a problem I need to figure out. But as a side effect u-boot goes to detect OP-TEE again with standard fast call API's. Now as in ctx->sec.mon_cpsr we had previous left over path with FIQ enabled it gets inadvertedly restored also for fast call. And then we get sanity checks at exit and get panic.

I would assume even with fixed u-boot when I get far enough eventually booting to Linux kernel it would then latest make the same calls to query OP-TEE's details and we would get the same panic later in there.

So after following trace output FIQ mask is removed in ctx:

I/TC: Primary CPU switching to normal world boot
D/TC:0   sm_platform_handler:453 sm_platform_handler, a0=0xbf00ff01
D/TC:0   sm_platform_handler:453 sm_platform_handler, a0=0xb2000001
D/TC:0   sm_platform_handler:453 sm_platform_handler, a0=0xbf00ff03
D/TC:0   sm_platform_handler:453 sm_platform_handler, a0=0xb2000009
D/TC:0   tee_entry_exchange_capabilities:100 Asynchronous notifications are disabled
D/TC:0   tee_entry_exchange_capabilities:109 Dynamic shared memory is enabled
D/TC:0   sm_platform_handler:453 sm_platform_handler, a0=0x32000004
D/TC:0 0 core_mmu_alloc_l2:272 L2 table used: 3/6
D/TC:? 0 tee_ta_init_pseudo_ta_session:296 Lookup pseudo TA a8cfe406-d4f5-4a2e-9f8d-a25dc754c099
D/TC:? 0 tee_ta_init_pseudo_ta_session:309 Open PTA-SCMI
D/TC:? 0 tee_ta_init_pseudo_ta_session:326 PTA-SCMI : a8cfe406-d4f5-4a2e-9f8d-a25dc754c099

Actual stack value change happens on line:

Just before that happens:

Before stepping over srsdb sp!, #CPSR_MODE_MON in registers we have:

N _  R0   BE000005  R8   3C0579A8  ^S+ ^Stack_+
Z Z  R1   FFFF0000  R9          3
C C  R2         A0  R10        A0
V _  R3          0  R11  FFFF0000
Q _  R4          0  R12  FFFCD2C0
     R5   3C082FF8  R13  3C085AB0
0 _  R6   800001D3  R14  3C003310
1 _  R7          0  PC   3C00859C
2 _  SPSR 60000193  CPSR 600001D6
3 _
4 _  USR:           FIQ:
     R8   3C0579A8  R8          0
     R9          3  R9          0
I I  R10        A0  R10         0
F F  R11  FFFF0000  R11         0
     R12  FFFCD2C0  R12         0
T _  R13         0  R13  3C085A40
J _  R14         0  R14         0
mon                 SPSR        0
sec
     SVC:           IRQ:
A A  R13  3C085A40  R13  3C085A40
E _  R14  3C001E0D  R14         0
     SPSR        0  SPSR       80
0 _
1 _  UND:           ABT:
2 _  R13  3C057968  R13  3C057968
3 _  R14         0  R14  3C003294
     SPSR        0  SPSR     0193

     MON:
     R13  3C085AB0
     R14  3C003310
     SPSR 60000193

Apparently here we do get value of MON's LR and SPSR to the stack and it just happened to point to right spot in there.

  (sm_ctx *)0x3C085A40 = 0x3C085A40 -> (
    sec = (
      ub_regs = (usr_sp = 0xE5801000, usr_lr = 0xFFFFFF
      r0 = 0x32000004,
      r1 = 0x0,
      r2 = 0x39FE3000,
      r3 = 0x0,
      r4 = 0x0,
      r5 = 0x0,
      r6 = 0x0,
      r7 = 0x0,
      mon_lr = 0x3C003310,
      mon_spsr = 0x60000193),
    pad = 0xAAAAAAAA,
    nsec = (ub_regs = (usr_sp = 0xAAAAAAAA, usr_lr = 0x

Now we have mon_spsr = 0x60000193 that poisons rest of the flows.

Tested that panic goes away with following workaround:

diff --git a/core/arch/arm/sm/sm.c b/core/arch/arm/sm/sm.c
index b4b7be43b..10c2813f4 100644
--- a/core/arch/arm/sm/sm.c
+++ b/core/arch/arm/sm/sm.c
@@ -89,6 +89,7 @@ uint32_t sm_from_nsec(struct sm_ctx *ctx)
        sm_restore_unbanked_regs(&ctx->sec.ub_regs);

        memcpy(&ctx->sec.r0, args, sizeof(*args));
+       ctx->sec.mon_spsr |= CPSR_F | CPSR_I;
        if (OPTEE_SMC_IS_FAST_CALL(ctx->sec.r0))
                ctx->sec.mon_lr = (uint32_t)vector_fast_smc_entry;
        else

Even with the patch I can seen that value of ctx->sec.mon_spsr gets into FIQ not masked state quite often but that workaround keeps problem at bay.

To me this looks like generic issue in OP-TEE and not related to Zynq-7000 as such. I just happened to have code flows that triggered it.

Now any better ideas what should be fixed and where 🤔 ?

jenswi-linaro commented 2 years ago

Initially what happens during the boot is that u-boot before relocation (in my case) goes to fetch information about OP-TEE environment with fast calls. All is good and CPSR gets retained so that FIQ mask is set. During fast call FIQ must be disabled and everything is good.

So far so good.

Next phase in u-boot before relocation is that it goes to use SCMI support to get information about clocks. Actual SCMI calls are based on std calls which then goes for OP-TEE's thread path. But what seems to be happening here is when PTA-SCMI gets opened ctx->sec.mon_cpsr gets new value of 0x60000193 which does not have FIQ mask set. Now this CPSR value starts to get persisted and used when ever we make fast call or std call.

This is the root of the problem where it needs to be fixed. OP-TEE should never use the temporary stack while not having exceptions masked. The temporary stack is used for exception processing (except abort) and to enter/exit OP-TEE.

github-actions[bot] commented 2 years ago

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.

vesajaaskelainen commented 2 years ago

keep-alive

github-actions[bot] commented 2 years ago

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.

vesajaaskelainen commented 2 years ago

keep-alive

jforissier commented 2 years ago

@vesajaaskelainen what is the status of this issue? My understanding is that your are investigating a potential bug in u-boot?

vesajaaskelainen commented 2 years ago

@vesajaaskelainen what is the status of this issue? My understanding is that your are investigating a potential bug in u-boot?

There is a bug in OP-TEE's secure monitor and I seem to be first one that actually went deeper to figure out what is happening (there were similar issues reported earlier that might have same problem origin).

Problem goes deep in "context" stacking when changing different security modes so it is not so clear picture what is the best fix for it.

But basically one would need to have clear return paths in stack but currently it re-uses memory in the stack when switching security states which ultimately causes the wrong state being restored and then further used in successive calls.

If you have energy to debug the issue I am happy to assist -- I am currently using the "ugly" workaround commented above so I could get forward.

I have been lately working with mpsoc and now getting back to zynq7000.

jenswi-linaro commented 2 years ago

@vesajaaskelainen what is the status of this issue? My understanding is that your are investigating a potential bug in u-boot?

There is a bug in OP-TEE's secure monitor and I seem to be first one that actually went deeper to figure out what is happening (there were similar issues reported earlier that might have same problem origin).

I'm not so sure about that. From what I've seen it seems that there's a fast call that unmasks FIQs.

github-actions[bot] commented 2 years ago

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.

vesajaaskelainen commented 2 years ago

keep-alive

vesajaaskelainen commented 2 years ago

Now I rebased my code on top of master and finally I have Xilinx Zynq-7000 booting into the Linux and also xtest is running fine in there. Some OP-TEE drivers need to be developed a bit more but starting to look good.

Before getting too far in cleaning up debugs and other stuff from the system I decided to revert the hack I had in place.

But still get:

E/TC:0   assertion 'thread_get_exceptions() == THREAD_EXCP_ALL' failed at core/arch/arm/kernel/thread_optee_smc.c:50 <thread_handle_fast_smc>

So need to dig a bit deeper in the logic behind the scenes in here.

Apparently I need to present the findings somehow and it probably is related to presenting how stack is evolving and so on going to deeper details. Would private presentation be preferable for this before feeding the tracker with more details :) ?

jforissier commented 2 years ago

Hi @vesajaaskelainen,

Apparently I need to present the findings somehow and it probably is related to presenting how stack is evolving and so on going to deeper details. Would private presentation be preferable for this before feeding the tracker with more details :) ?

Perhaps it would make a good topic for the next Linaro OP-TEE Contributions (LOC) meeting on Aug 25th (see https://www.trustedfirmware.org/meetings/ and https://bit.ly/loc-notes), what do you think?

vesajaaskelainen commented 2 years ago

Hi @vesajaaskelainen,

Apparently I need to present the findings somehow and it probably is related to presenting how stack is evolving and so on going to deeper details. Would private presentation be preferable for this before feeding the tracker with more details :) ?

Perhaps it would make a good topic for the next Linaro OP-TEE Contributions (LOC) meeting on Aug 25th (see https://www.trustedfirmware.org/meetings/ and https://bit.ly/loc-notes), what do you think?

I'll try to be ready with the presentation for that meeting. So lets book it.

I could also give a status update on improved zynq7k support and then perhaps discuss some topics on that too.

jforissier commented 2 years ago

@vesajaaskelainen great, thanks. Agenda updated in https://bit.ly/loc-notes.

vesajaaskelainen commented 2 years ago

@jforissier umh meeting was closed before it was started?

In the time table it says 16:00 and all times in UK so that would be 18:00 in Finnish time? (eg. starting in 15 minutes)

jforissier commented 2 years ago

@jforissier umh meeting was closed before it was started?

In the time table it says 16:00 and all times in UK so that would be 18:00 in Finnish time? (eg. starting in 15 minutes)

My bad. I was 1h early it seems! Sorry about that.

vesajaaskelainen commented 2 years ago

@jforissier umh meeting was closed before it was started? In the time table it says 16:00 and all times in UK so that would be 18:00 in Finnish time? (eg. starting in 15 minutes)

My bad. I was 1h early it seems! Sorry about that.

Ok. I am in the call :)

github-actions[bot] commented 2 years ago

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.

vesajaaskelainen commented 2 years ago

keep-alive

github-actions[bot] commented 1 year ago

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.