OP-TEE / optee_os

Trusted side of the TEE
Other
1.58k stars 1.07k forks source link

What will happen for optee-os when Linux disable a cpu core ? #6170

Closed jiameixie closed 1 year ago

jiameixie commented 1 year ago

Hi all,

In Linux, run the below command to disable a CPU core and the kernel panic.

echo 0 > /sys/devices/system/cpu/cpu0/online
kernel BUG at arch/arm64/kernel/smp.c:383!
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in: rpmsg_net(O) virtio_rpmsg_bus rpmsg_ns rpmsg_core arm_si_rproc(O) arm_mhuv2 sch_fq_codel openvswitch nf_conncount nsh nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 arm_ffa_tee(O)
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          O      6.1.32-yocto-standard #1
Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2023.01 01/01/2023
pstate: a24003c9 (NzCv DAIF +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
pc : cpu_die+0x40/0x4c
lr : cpu_die+0x40/0x4c
sp : ffff8000092a3d60
x29: ffff8000092a3d60 x28: 00000000dbae6bb4 x27: ffff800008db6ad0
x26: ffff800008db6ad0 x25: 00000000000000a0 x24: 0000000000000060
x23: ffff8000092a8938 x22: ffff80000953d3d0 x21: ffff8000092a8a58
x20: ffff800008c01930 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffc834de90
x14: 0000000000000004 x13: 00000ad6daaf0aac x12: ffff800008c0bd58
x11: 0000000000000040 x10: 0000000000000000 x9 : ffff800008021004
x8 : ffff8000092a3d18 x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : ffff8000092b0bc0 x0 : 00000000ffffffea
Call trace:
 cpu_die+0x40/0x4c
 arch_cpu_idle_dead+0x18/0x2c
 do_idle+0x114/0x120
 cpu_startup_entry+0x2c/0x34
 rest_init+0xe8/0xf0
 arch_post_acpi_subsys_init+0x0/0x28
 start_kernel+0x6d8/0x718
 __primary_switched+0xb4/0xbc
Code: 94012a17 f9401e81 2a1303e0 d63f0020 (d4210000)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x00000,00050cf7,e29e7727
Memory Limit: 2048 MB
---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

I added some log to debug and find the below things:

    if ((psci_spd_pm != NULL) && (psci_spd_pm->svc_off != NULL)) {
        rc = psci_spd_pm->svc_off(0);
        if (rc != 0)
            goto exit;
    }

This above code is in link, which will call spmd_cpu_off_handler.

/TC:04   thread_spmc_msg_recv:949 xjm ----0000 thread_spmc_msg_recv, args->a0:0x8400006f
VERBOSE: SPM(1): 0x84000060 0xffff8000 0xfffffffe 0x84000002 0x0 0x0 0x0 0x0

I noticed some platforms implemented with psci_cpu_off.

Is that mandatory to add this function when adding a new platform for supporting SMP? What will happen in optee-os when disable a cpu core in linux? Could someone please give me some clue? Thanks.

jiameixie commented 1 year ago

By adding log in optee-os, I found it enters into the below if block in ffa_handle_sp_direct_req .

    if (args->a2 != FFA_PARAM_MBZ) {
        ffa_set_error(args, FFA_INVALID_PARAMETERS);
        return NULL;
    }

What does args->a2 represent in this context? why does it must be zero?

jenswi-linaro commented 1 year ago

I think you should ask the one providing the configuration you're using these questions. It seems to have something to do with a Trusted Service configuration so perhaps someone from that team.

jiameixie commented 1 year ago

There are some descriptions about CPU_OFF and CPU_SUSPEND on page 52 of https://developer.arm.com/documentation/den0022/latest/ . Something like below.

5.4.7 Implementation responsibilities: Interaction with a Trusted OS or SP When a caller requests a power state, the PSCI implementation in the privileged platform firmware might need to communicate with a Trusted OS or SP. A method for interfacing between PPF and a Trusted OS or SP is specified by Firmware Framework for Arm v8-A [10]. However, the specification requires the Trusted OS or SP to always comply with CPU_SUSPEND requests. The PPF can inform the Trusted OS or SP that a power state needs to be entered, and the Trusted OS or SP can use this information to take any preparatory actions. For example, it might have to save its context. However, this communication must not allow the Trusted OS or SP to modify or prevent the power state requested. The Trusted OS or SP might not be able to tolerate a particular state, for latency or other reasons. In this case, Arm recommends that the Trusted OS or SP uses an IMPLEMENTATION DEFINED mechanism to communicate with the Normal world to ensure its constraints are considered in the power requests originating from the Normal world.

jiameixie commented 1 year ago

I think you should ask the one providing the configuration you're using these questions. It seems to have something to do with a Trusted Service configuration so perhaps someone from that team.

@jenswi-linaro Do you mean Trusted Service team?

jenswi-linaro commented 1 year ago

Yes, https://github.com/OP-TEE/optee_os/blame/a8719249e3e260f172ab11c35314b068c5f25257/core/arch/arm/kernel/spmc_sp_handler.c#L824 should give a few names if you don't know where to start.

jiameixie commented 1 year ago

@jenswi-linaro Thanks.

odeprez commented 1 year ago

Hi,

Is this OP-TEE SPMC without secure virtualization?

In which case it might not support FF-A v1.1 power management messages. https://developer.arm.com/documentation/den0077/e/?lang=en section 18.3.4

The SPMD sends a PSCI framework message to the SPMC on a CPU off event: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/services/std_svc/spmd/spmd_pm.c#n124

It is received by OP-TEE but it doesn't support the framework message bit 31 passed in w2.

We could filter this out in the SPMD if the SPMC is declared as FF-A v1.0 in the SPMC manifest attributes.

jiameixie commented 1 year ago

@odeprez Thanks for your answer.

Hi,

Is this OP-TEE SPMC without secure virtualization? This OP-TEE SPMC is without secure virtualization.

In which case it might not support FF-A v1.1 power management messages. https://developer.arm.com/documentation/den0077/e/?lang=en section 18.3.4

The SPMD sends a PSCI framework message to the SPMC on a CPU off event: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/services/std_svc/spmd/spmd_pm.c#n124

It is received by OP-TEE but it doesn't support the framework message bit 31 passed in w2.

We could filter this out in the SPMD if the SPMC is declared as FF-A v1.0 in the SPMC manifest attributes.

The SPMC is declared as FF-A v1.0 in the SPMC manifest attributes.

I have the below questions:

odeprez commented 1 year ago

Hi, What does "We could filter this out in the SPMD " mean? I suggested a change similar to this: https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/22025 It is yet to be fully tested, but you could try on your platform.

_Does it mean not handling the power management messages and ignore it so that OP-TEE will not enter ffa_handle_sp_directreq Yes.

_And does it also mean CPUOFF operation is not supported? The below command should fail After a fix similar to the above, CPU_OFF emitted from normal should be restored back to functional.