OP-TEE / optee_os

Trusted side of the TEE
Other
1.54k stars 1.04k forks source link

Enabel OPTEE on Linux5.4 cause rcu_preempt detected stalls #4074

Closed fangyuanseu closed 3 years ago

fangyuanseu commented 3 years ago

Hi related guys,

I met rcu_preempt detected stalls when i enable OPTEE(3.0) on Linux5.4 and run on Qemu,
The test ENV are: coreboot-->bl31-->bl32(OPTEE OS)-->depthcharge-->kernel.
ps, CONFIG_PREEMPT and CONFIG_PREEMPT_RCU is enabled and CONFIG_PREEMPT_VOLUNTRAY is not set. 
please find the kernel log as blow, any suggestion will be appreciated and thanks in advance.

jumping to kernel [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x412fd050] [ 0.000000] Linux version 5.4.37+ (cn1257@coding1257) (gcc version 9.2.1 20191025 (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10))) #392 SMP PREEMPT Wed Sep 9 10:09:47 CST 2020 [ 0.000000] random: fast init done [ 0.000000] Machine model: FalconLite [ 0.000000] earlycon: uart0 at MMIO 0x000000000a100000 (options '115200') [ 0.000000] printk: bootconsole [uart0] enabled [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv0.2 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: Trusted OS migration not required [ 0.000000] percpu: Embedded 31 pages/cpu s88040 r8192 d30744 u126976 [ 0.000000] Detected VIPT I-cache on CPU0 [ 0.000000] CPU features: detected: GIC system register CPU interface [ 0.000000] CPU features: kernel page table isolation forced ON by KASLR [ 0.000000] CPU features: detected: Kernel page table isolation (KPTI) [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 515340 [ 0.000000] Kernel command line: cros_secure earlycon console=ttyS0,115200 rdinit=/init root=/dev/ram0 [ 0.000000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear) [ 0.000000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] Memory: 2009532K/2094080K available (5630K kernel code, 620K rwdata, 1512K rodata, 34816K init, 423K bss, 84548K reserved, 0K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 [ 0.000000] ftrace: allocating 22443 entries in 88 pages [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies. [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GICv3: GIC: Using split EOI/Deactivate mode [ 0.000000] GICv3: 224 SPIs implemented [ 0.000000] GICv3: 0 Extended SPIs implemented [ 0.000000] GICv3: Distributor has no Range Selector support [ 0.000000] GICv3: 16 PPIs implemented [ 0.000000] GICv3: no VLPI support, no direct LPI support [ 0.000000] GICv3: CPU0: found redistributor 0 region 0:0x000000000c060000 [ 0.000000] random: get_random_bytes called from start_kernel+0x31c/0x494 with crng_init=1 [ 0.000000] arch_timer: cp15 timer(s) running at 24.00MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x588fe9dc0, max_idle_ns: 440795202592 ns [ 0.000334] sched_clock: 56 bits at 24MHz, resolution 41ns, wraps every 4398046511097ns [ 0.031660] Console: colour dummy device 80x25 [ 0.037694] Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=24000) [ 0.039326] pid_max: default: 32768 minimum: 301 [ 0.045514] LSM: Security Framework initializing [ 0.048942] Yama: becoming mindful. [ 0.053364] LoadPin: ready to pin (currently not enforcing) [ 0.054888] Chromium OS LSM: enabled [ 0.056857] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear) [ 0.058299] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear) [ 0.265049] ASID allocator initialised with 32768 entries [ 0.271907] rcu: Hierarchical SRCU implementation. [ 0.307451] smp: Bringing up secondary CPUs ... [ 0.339060] Detected VIPT I-cache on CPU1 [ 0.339881] CPU features: SANITY CHECK: Unexpected variation in SYS_CNTFRQ_EL0. Boot CPU: 0x000000016e3600, CPU1: 0x00000003b9aca0 [ 0.340890] CPU features: Unsupported CPU feature variation detected. [ 0.341655] GICv3: CPU1: found redistributor 100 region 0:0x000000000c080000 [ 0.342666] CPU1: Booted secondary processor 0x0000000100 [0x412fd050] [ 0.380449] Detected VIPT I-cache on CPU2 [ 0.380612] CPU features: SANITY CHECK: Unexpected variation in SYS_CNTFRQ_EL0. Boot CPU: 0x000000016e3600, CPU2: 0x00000003b9aca0 [ 0.381014] GICv3: CPU2: found redistributor 200 region 0:0x000000000c0a0000 [ 0.381490] CPU2: Booted secondary processor 0x0000000200 [0x412fd050] [ 0.406135] Detected VIPT I-cache on CPU3 [ 0.406281] CPU features: SANITY CHECK: Unexpected variation in SYS_CNTFRQ_EL0. Boot CPU: 0x000000016e3600, CPU3: 0x00000003b9aca0 [ 0.406650] GICv3: CPU3: found redistributor 300 region 0:0x000000000c0c0000 [ 0.407012] CPU3: Booted secondary processor 0x0000000300 [0x412fd050] [ 0.431233] Detected VIPT I-cache on CPU4 [ 0.431375] CPU features: SANITY CHECK: Unexpected variation in SYS_CNTFRQ_EL0. Boot CPU: 0x000000016e3600, CPU4: 0x00000003b9aca0 [ 0.431582] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: 0x1100000011111112 [ 0.431827] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU4: 0x00000010010000 [ 0.432029] GICv3: CPU4: found redistributor 400 region 0:0x000000000c0e0000 [ 0.432492] CPU4: Booted secondary processor 0x0000000400 [0x410fd0d1] [ 0.456463] Detected VIPT I-cache on CPU5 [ 0.456602] CPU features: SANITY CHECK: Unexpected variation in SYS_CNTFRQ_EL0. Boot CPU: 0x000000016e3600, CPU5: 0x00000003b9aca0 [ 0.456749] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU5: 0x1100000011111112 [ 0.456937] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU5: 0x00000010010000 [ 0.457190] GICv3: CPU5: found redistributor 500 region 0:0x000000000c100000 [ 0.457646] CPU5: Booted secondary processor 0x0000000500 [0x410fd0d1] [ 0.480799] Detected VIPT I-cache on CPU6 [ 0.480940] CPU features: SANITY CHECK: Unexpected variation in SYS_CNTFRQ_EL0. Boot CPU: 0x000000016e3600, CPU6: 0x00000003b9aca0 [ 0.481086] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU6: 0x1100000011111112 [ 0.481265] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU6: 0x00000010010000 [ 0.481459] GICv3: CPU6: found redistributor 600 region 0:0x000000000c120000 [ 0.481837] CPU6: Booted secondary processor 0x0000000600 [0x410fd0d1] [ 0.506674] Detected VIPT I-cache on CPU7 [ 0.506887] CPU features: SANITY CHECK: Unexpected variation in SYS_CNTFRQ_EL0. Boot CPU: 0x000000016e3600, CPU7: 0x00000003b9aca0 [ 0.507067] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU7: 0x1100000011111112 [ 0.507280] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_PFR1_EL1. Boot CPU: 0x00000010011011, CPU7: 0x00000010010000 [ 0.507530] GICv3: CPU7: found redistributor 700 region 0:0x000000000c140000 [ 0.508123] CPU7: Booted secondary processor 0x0000000700 [0x410fd0d1] [ 0.511450] smp: Brought up 1 node, 8 CPUs [ 0.565336] SMP: Total of 8 processors activated. [ 0.566682] CPU features: detected: 32-bit EL0 Support [ 0.568358] CPU features: detected: RAS Extension Support [ 0.569189] CPU features: detected: Data cache clean to the PoU not required for I/D coherence [ 0.572653] CPU features: detected: CRC32 instructions [ 0.623986] CPU: All CPU(s) started at EL2 [ 0.632376] alternatives: patching kernel code [ 0.721393] devtmpfs: initialized [ 0.849549] Registered cp15_barrier emulation handler [ 0.853646] Registered setend emulation handler [ 0.888236] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns [ 0.890991] futex hash table entries: 2048 (order: 5, 131072 bytes, linear) [ 0.939456] NET: Registered protocol family 16 [ 1.003533] DMA: preallocated 256 KiB pool for atomic allocations [ 1.047333] cpuidle: using governor ladder [ 1.052218] cpuidle: using governor menu [ 1.059308] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. [ 1.076808] Serial: AMBA PL011 UART driver [ 1.114582] printk: console [pstore-1] enabled [ 1.116333] pstore: Registered ramoops as persistent store backend [ 1.118620] ramoops: using 0x100000@0xffec8000, ecc: 0 [ 1.579624] cryptd: max_cpu_qlen set to 1000 [ 1.835839] iommu: Default domain type: Translated [ 1.845852] videodev: Linux video capture interface: v2.00 [ 1.967088] clocksource: Switched to clocksource arch_sys_counter [ 2.993279] VFS: Disk quotas dquot_6.6.0 [ 2.996223] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 3.352674] NET: Registered protocol family 2 [ 3.386797] tcp_listen_portaddr_hash hash table entries: 1024 (order: 3, 40960 bytes, linear) [ 3.390723] TCP established hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 3.393688] TCP bind hash table entries: 16384 (order: 7, 524288 bytes, linear) [ 3.404659] TCP: Hash tables configured (established 16384 bind 16384) [ 3.421566] UDP hash table entries: 1024 (order: 4, 98304 bytes, linear) [ 3.425435] UDP-Lite hash table entries: 1024 (order: 4, 98304 bytes, linear) [ 3.440581] NET: Registered protocol family 1 [ 24.278802] hw perfevents: enabled with armv8_pmuv3 PMU driver, 5 counters available [ 24.649398] workingset: timestamp_bits=62 max_order=19 bucket_order=0 [ 24.859528] squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 24.888513] alt_syscall: table 'read_write_test' available. [ 24.889734] alt_syscall: table 'android' available. [ 24.891152] alt_syscall: table 'android_permissive' available. [ 24.904629] alt_syscall: table 'third_party' available. [ 24.905717] alt_syscall: table 'third_party_permissive' available. [ 24.907158] alt_syscall: table 'complete' available. [ 24.908224] alt_syscall: table 'complete_permissive' available. [ 24.981639] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253) [ 24.984590] io scheduler mq-deadline registered [ 24.986383] io scheduler kyber registered [ 24.994761] io scheduler bfq registered [ 25.022705] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled �[ 25.112327] printk: console [ttyS0] disabled [ 25.166315] a100000.serial: ttyS0 at MMIO 0xa100000 (irq = 7, base_baud = 1500000) is a 16550A [ 25.178139] printk: console [ttyS0] enabled [ 25.178139] printk: console [ttyS0] enabled [ 25.180358] printk: bootconsole [uart0] disabled [ 25.180358] printk: bootconsole [uart0] disabled [ 25.198592] arm-smmu-v3 b400000.csi_smmu: option verisilicon,force-sid-bits [ 25.204705] arm-smmu-v3 b400000.csi_smmu: ssid_bits = 0 sid_bits = 10 [ 25.207446] arm-smmu-v3 b400000.csi_smmu: ias 44-bit, oas 44-bit (features 0x00000304) [ 25.280203] arm-smmu-v3 b400000.csi_smmu: allocated 262144 entries for cmdq [ 25.440509] arm-smmu-v3 b400000.csi_smmu: allocated 131072 entries for evtq [ 25.444233] arm-smmu-v3 b400000.csi_smmu: allocated sid_group size = 4096 [ 25.479207] arm-smmu-v3 b000000.sys_smmu: option verisilicon,force-sid-bits [ 25.484474] arm-smmu-v3 b000000.sys_smmu: ssid_bits = 0 sid_bits = 10 [ 25.487809] arm-smmu-v3 b000000.sys_smmu: ias 44-bit, oas 44-bit (features 0x00000304) [ 25.585363] arm-smmu-v3 b000000.sys_smmu: allocated 262144 entries for cmdq [ 25.623556] arm-smmu-v3 b000000.sys_smmu: allocated 131072 entries for evtq [ 25.627693] arm-smmu-v3 b000000.sys_smmu: allocated sid_group size = 4096 [ 25.914649] loop: module loaded [ 25.927560] lkdtm: No crash points registered, enable through debugfs [ 25.941789] libphy: Fixed MDIO Bus: probed [ 25.992588] energy_model: pd0: hertz/watts ratio non-monotonically decreasing: em_cap_state 1 >= em_cap_state0 [ 26.010305] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 24000 KHz [ 26.022680] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz [ 26.104110] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 24000 KHz [ 26.106579] cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 408000 KHz [ 26.134473] sdhci: Secure Digital Host Controller Interface driver [ 26.141460] sdhci: Copyright(c) Pierre Ossman [ 26.146575] sdhci-pltfm: SDHCI platform and OF driver helper [ 26.263395] mmc0: SDHCI controller on a320000.sdhci [a320000.sdhci] using ADMA 64-bit [ 26.322632] mmc0: Problem switching card into high-speed mode! [ 26.328705] mmc0: new SD card at address 4567 [ 26.353827] optee: probing for conduit method from DT. [ 26.368568] mmcblk0: mmc0:4567 QEMU! 64.0 MiB (ro) [ 26.559688] mmcblk0: p1 [ 47.564602] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 47.567170] rcu: 5-...0: (30 ticks this GP) idle=47e/1/0x4000000000000000 softirq=939/941 fqs=5170 [ 47.569457] (detected by 4, t=21005 jiffies, g=2249, q=7) [ 47.571632] Task dump for CPU 5: [ 47.572949] swapper/0 R running task 0 1 0 0x0000002a [ 47.575331] Call trace: [ 47.576144] switch_to+0xcc/0x134 [ 110.569063] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 110.570959] rcu: 5-...0: (30 ticks this GP) idle=47e/1/0x4000000000000000 softirq=939/941 fqs=20767 [ 110.573640] (detected by 4, t=84010 jiffies, g=2249, q=7) [ 110.576033] Task dump for CPU 5: [ 110.577033] swapper/0 R running task 0 1 0 0x0000002a [ 110.578799] Call trace: [ 110.579614] switch_to+0xcc/0x134 [ 173.574257] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 173.576009] rcu: 5-...0: (30 ticks this GP) idle=47e/1/0x4000000000000000 softirq=939/941 fqs=36155 [ 173.578555] (detected by 4, t=147015 jiffies, g=2249, q=7) [ 173.580448] Task dump for CPU 5: [ 173.581462] swapper/0 R running task 0 1 0 0x0000002a [ 173.583133] Call trace: [ 173.583947] switch_to+0xcc/0x134 [ 236.579035] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 236.580432] rcu: 5-...0: (30 ticks this GP) idle=47e/1/0x4000000000000000 softirq=939/941 fqs=51494 [ 236.582255] (detected by 4, t=210018 jiffies, g=2249, q=7) [ 236.583591] Task dump for CPU 5: [ 236.584562] swapper/0 R running task 0 1 0 0x0000002a [ 236.586078] Call trace: [ 236.586785] switch_to+0xcc/0x134 [ 299.584103] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 299.585893] rcu: 5-...0: (30 ticks this GP) idle=47e/1/0x4000000000000000 softirq=939/941 fqs=66824 [ 299.588128] (detected by 4, t=273025 jiffies, g=2249, q=7) [ 299.589727] Task dump for CPU 5: [ 299.590541] swapper/0 R running task 0 1 0 0x0000002a [ 299.592535] Call trace: [ 299.593272] switch_to+0xcc/0x134 [ 362.589071] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 362.590629] rcu: 5-...0: (30 ticks this GP) idle=47e/1/0x4000000000000000 softirq=939/941 fqs=82152 [ 362.593514] (detected by 4, t=336029 jiffies, g=2249, q=9) [ 362.595350] Task dump for CPU 5: [ 362.596299] swapper/0 R running task 0 1 0 0x0000002a [ 362.597702] Call trace: [ 362.598430] switch_to+0xcc/0x134

fangyuanseu commented 3 years ago

More info. If enable CONFIG_PREEMPT_VOLUNTRAY and disable CONFIG_PREEMPT, another RCU error jumps out as below, which seems this path: https://lore.kernel.org/patchwork/patch/1096873/ did not works.

[ 105.202057] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 105.203729] rcu: 6-...!: (81 ticks this GP) idle=546/1/0x4000000000000000 softirq=601/602 fqs=88 [ 105.205013] (detected by 5, t=89245 jiffies, g=689, q=7) [ 105.205875] Task dump for CPU 6: [ 105.206767] swapper/0 R running task 0 1 0 0x00000028 [ 105.208063] Call trace: [ 105.208615] switch_to+0xcc/0x134 [ 105.209767] free_vmap_area_rb_augment_cb_rotate+0x0/0x68 [ 105.210998] rcu: rcu_sched kthread starved for 88892 jiffies! g689 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=5 [ 105.211981] rcu: RCU grace-period kthread stack dump: [ 105.212630] rcu_sched I 0 10 2 0x00000028 [ 105.213267] Call trace: [ 105.213671] switch_to+0xcc/0x134 [ 105.214176] schedule+0x434/0x578 [ 105.214729] schedule+0x7c/0xa4 [ 105.215204] schedule_timeout+0xd8/0xfc [ 105.215901] rcu_gp_kthread+0x3d4/0x6f4 [ 105.216406] kthread+0xfc/0x10c [ 105.216873] ret_from_fork+0x10/0x18 [ 194.681010] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 194.682222] rcu: 6-...0: (81 ticks this GP) idle=546/1/0x4000000000000000 softirq=601/602 fqs=92 [ 194.683113] (detected by 5, t=178725 jiffies, g=689, q=7) [ 194.683782] Task dump for CPU 6: [ 194.684230] swapper/0 R running task 0 1 0 0x0000002a [ 194.685091] Call trace: [ 194.685506] switch_to+0xcc/0x134 [ 194.685991] free_vmap_area_rb_augment_cb_rotate+0x0/0x68

jenswi-linaro commented 3 years ago

This may relate to https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/5561

fangyuanseu commented 3 years ago

This may relate to https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/5561

Thank you for your quick reply, i picked up the patch you mentioned and test result are all the same, IMO, my case looks different, because i run the test on Qemu.

jenswi-linaro commented 3 years ago

Aha, so you're running on ARMv7?

jenswi-linaro commented 3 years ago

Does this happen during boot? It looks like you're running something slightly different from what we're normally testing with. Have you tried our normal build for QEMU?

fangyuanseu commented 3 years ago

Aha, so you're running on ARMv7?

ARMv8

fangyuanseu commented 3 years ago

Does this happen during boot? It looks like you're running something slightly different from what we're normally testing with. Have you tried our normal build for QEMU?

During linux bootup, at tee driver probe as i think, i add some log and find the first smc call invoked by tee driver hang up without return, i'm not sure if tee stay too long at secure word or atf works not as expect.

jenswi-linaro commented 3 years ago

In that case I think something is hanging.

fangyuanseu commented 3 years ago

In that case I think something is hanging.

Yes, i can not point out root cause, looking for help here

MrVan commented 3 years ago

" [ 47.567170] rcu: 5-...0: (30 ticks this GP) idle=47e/1/0x4000000000000000 softirq=939/941 fqs=5170 [ 47.569457] (detected by 4, t=21005 jiffies, g=2249, q=7) [ 47.571632] Task dump for CPU 5: "

The log shows CPU 5 hang.

Ever tried "maxcpus=1" in bootargs, to only boot one cpu?

Per hang in CPU 5, I suppose your tee smc issued on CPU 5, then not return back, might use qemu gdb to see what CPU 5 is doing.

If only boot one cpu, with the upper bootargs, it might be easy to debug using gdb.

fangyuanseu commented 3 years ago

" [ 47.567170] rcu: 5-...0: (30 ticks this GP) idle=47e/1/0x4000000000000000 softirq=939/941 fqs=5170 [ 47.569457] (detected by 4, t=21005 jiffies, g=2249, q=7) [ 47.571632] Task dump for CPU 5: "

The log shows CPU 5 hang.

Ever tried "maxcpus=1" in bootargs, to only boot one cpu?

Per hang in CPU 5, I suppose your tee smc issued on CPU 5, then not return back, might use qemu gdb to see what CPU 5 is doing.

If only boot one cpu, with the upper bootargs, it might be easy to debug using gdb.

I remove "-smp 8" from Qemu boot args, then this issue gone away, i can't figure out why this issue releated to smp. The optee version i used is OP-TEE version: 3.10.0-13-g86565d34-dev

jenswi-linaro commented 3 years ago

It looks like this could be related to https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/4860

fangyuanseu commented 3 years ago

It looks like this could be related to https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/4860

Thanks a lot for your info, i will have a try and return back asap.

fangyuanseu commented 3 years ago

It looks like this could be related to https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/4860

After check this commit, i think it not related with my issue, beacuase i porting a "Maching" to Qemu, and run Qemu with this maching, as pre-silicon simulation, in this maching platform, the core number is define as 8. I try some test: Enable smp 4 in qemu, this issue gone away, but enable smp 8 will cause this issue,which seems like this issue related with core number. Additionaly, from the kernel log, cpu stalls on cpu 5, so, guess a scenario like: tee driver issue smc call on cpu 5, but BL31 works on cpu 0 (i have a trace log, cpu id is 0), smc call was not received well, and can not pass smc call to BL32(TEE OS), after rcu_timeout, an cpu stalls was detecet on cpu 5, is it corrent? as i'm a new guy to OPTEE, can you please confirm this? thanks in advance.

ldts commented 3 years ago

This may relate to https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/5561

@clementfaure @jenswi-linaro thanks, incidentally this fixes rcu_preempt detected stall issues when running RSA key creation tests (xtest 4007) using CAAM on imx8mm-evk (optee-3.10.0, TF-A imx_5.4.24_2.1.0, Linux 5.4-2.1.x-imx). This saved me a lot of debugging...so thanks again (btw this is fine when using the NXP SE050 instead...seems CAAM uses more processor time)

Emantor commented 3 years ago

I just send a patch to fix RCU stalls with CONFIG_PREEMPT_NONE=y, this might fix your stalls as well. I am not certain of your current kernel configuration. Patch can be found here.

MrVan commented 3 years ago

@Emantor you still need the ATF fix to avoid NS-timer interrupt being delayed.

fangyuanseu commented 3 years ago

I just send a patch to fix RCU stalls with CONFIG_PREEMPT_NONE=y, this might fix your stalls as well. I am not certain of your current kernel configuration. Patch can be found here.

Thank you for you info@Emantor ,in my kernel configuration, CONFIG_PREEMPT=y and i'm afried disable preempt is a workaround rather then a rootcause, in my case, the root cause maybe, when enable TEE, TEE driver will send smc call, and tee os did not received, so cpu stalls was detected.

Emantor commented 3 years ago

@Emantor you still need the ATF fix to avoid NS-timer interrupt being delayed.

My fix should be unrelated to the ATF timer restore removal, since it primarily fixes a problem on 32-bit STM32MP1 with CONFIG_PREEMPT_NONE. Unless there is a pending patch for ARM 32-bit as well.

Thank you for you info@Emantor ,in my kernel configuration, CONFIG_PREEMPT=y and i'm afried disable preempt is a workaround rather then a rootcause, in my case, the root cause maybe, when enable TEE, TEE driver will send smc call, and tee os did not received, so cpu stalls was detected.

My patch only fixes the PREEMPT_NONE case, there should not be any effect on `CONFIG_PREEMPT(_VOLUNTARY) kernels. I thought my fix could be related, but I wasn't sure which kernel configuration was applied in your case. Hope your bug gets found and squashed soon.

github-actions[bot] commented 3 years ago

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.