Closed vesajaaskelainen closed 1 year ago
FIQ should never be unmasked while in monitor mode. If that happens the stack can become corrupted.
FIQ should never be unmasked while in monitor mode. If that happens the stack can become corrupted.
Isn't this code executed in monitor mode and it is clearing FIQ: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm_a32.S#L177
Here is also code that sets FIQ: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm_a32.S#L159
My understanding is that this line does the actual mode change: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm_a32.S#L196
And this then tweaks where to continue: https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm.c#L96
To rule out OEM extensions I faked return values to remove these:
# Note: In here we have fast SMC calls for OEM extensions
D/TC:0 sm_platform_handler:447 sm_platform_handler, a0=0x83000102
D/TC:0 read_pss_idcode:321 Reading PSS_IDCODE
DRAM: D/TC:0 sm_platform_handler:447 sm_platform_handler, a0=0x83000106
D/TC:0 read_ddrc_ctrl:369 Reading DDRC_CTRL
D/TC:0 sm_platform_handler:447 sm_platform_handler, a0=0x83000107
D/TC:0 read_ecc_scrub:379 Reading ECC_SCRUB
ECC disabled 1 GiB
So OEM extensions as such may not be an issue.
SCR.FIQ
doesn't mask or unmask FIQs, instead it controls where it's handled. While in secure world we'd like to take it via OP-TEE's normal exception vector. But in normal world we'd instead like to take it via the monitor vector. Masking is done with CPSR.
I'll did some more debugging with trace32.
My break points:
break.set sm_from_nsec\1 /Onchip /Program
break.set sm_from_nsec+0x78 /Onchip /Program
Break.Set thread_handle_fast_smc+0x4 /Onchip /Program
Break.Set thread_handle_fast_smc\out+0x8 /Onchip /Program
Those are for:
Here is the first entries to break points to get an idea what we expect to see on working case.
--- first entry to \\tee\sm\sm_from_nsec+0x4
-- registers --
N _ R0 3C085A40 R8 04033270 ^S+ ^Stack_+
Z _ R1 20 R9 39FE1E00
C C R2 0 R10 AAAAAAAA
V _ R3 0 R11 AAAAAAAA
Q _ R4 3C085A40 R12 39FE1D30
R5 0 R13 3C085A30
0 _ R6 0 R14 3C008620
1 _ R7 0 PC 3C0086C8
2 _ SPSR 20000093 CPSR 200001F6
3 _
4 _ USR: FIQ:
R8 04033270 R8 0
R9 39FE1E00 R9 0
I I R10 AAAAAAAA R10 0
F F R11 AAAAAAAA R11 0
R12 39FE1D30 R12 0
T T R13 AAAAAAAA R13 AAAAAAAA
J _ R14 AAAAAAAA R14 AAAAAAAA
mon SPSR AA0AAAAA
sec
SVC: IRQ:
A A R13 39FE1D20 R13 AAAAAAAA
E _ R14 040332A8 R14 AAAAAAAA
SPSR AA0AAAAA SPSR AA0AAAAA
0 _
1 _ UND: ABT:
2 _ R13 AAAAAAAA R13 AAAAAAAA
3 _ R14 AAAAAAAA R14 AAAAAAAA
SPSR AA0AAAAA SPSR AA0AAAAA
MON:
R13 3C085A30
R14 3C008620
SPSR 20000093
-- ctx --
ctx = 0x3C085A40 -> (
sec = (
ub_regs = (
usr_sp = 0xE5801000,
usr_lr = 0xFFFFFF20,
irq_spsr = 0x80,
irq_sp = 0x3C085A40,
irq_lr = 0x0,
fiq_spsr = 0x0,
fiq_sp = 0x3C085A40,
fiq_lr = 0x0,
svc_spsr = 0x0,
svc_sp = 0x3C085A40,
svc_lr = 0x3C001E0D,
abt_spsr = 0x0,
abt_sp = 0x3C057968,
abt_lr = 0x0,
und_spsr = 0x0,
und_sp = 0x3C057968,
und_lr = 0x0,
pmcr = 0x41093021),
r0 = 0xBE000000,
r1 = 0x0,
r2 = 0x0,
r3 = 0x04079C9C,
r4 = 0x0,
r5 = 0x04000000,
r6 = 0x04079C9C,
r7 = 0x0,
mon_lr = 0x3C0002A0,
mon_spsr = 0x600001D3),
pad = 0xAAAAAAAA,
nsec = (
ub_regs = (
usr_sp = 0xAAAAAAAA,
usr_lr = 0xAAAAAAAA,
irq_spsr = 0xAAAAAAAA,
irq_sp = 0xAAAAAAAA,
irq_lr = 0xAAAAAAAA,
fiq_spsr = 0xAAAAAAAA,
fiq_sp = 0xAAAAAAAA,
fiq_lr = 0xAAAAAAAA,
svc_spsr = 0xAAAAAAAA,
svc_sp = 0xAAAAAAAA,
svc_lr = 0xAAAAAAAA,
abt_spsr = 0xAAAAAAAA,
abt_sp = 0xAAAAAAAA,
abt_lr = 0xAAAAAAAA,
und_spsr = 0xAAAAAAAA,
und_sp = 0xAAAAAAAA,
und_lr = 0xAAAAAAAA,
pmcr = 0xAAAAAAAA),
r8 = 0x04033270,
r9 = 0x39FE1E00,
r10 = 0xAAAAAAAA,
r11 = 0xAAAAAAAA,
r12 = 0x39FE1D30,
r0 = 0xBF00FF01,
r1 = 0x0,
r2 = 0x0,
r3 = 0x0,
r4 = 0x0,
r5 = 0x0,
r6 = 0x0,
r7 = 0x0,
mon_lr = 0x040006C4,
mon_spsr = 0x20000093))
--- \\tee\sm\sm_from_nsec+0x78
-- registers --
N _ R0 3C085A88 R8 04033270 ^S+ ^Stack_+
Z Z R1 3C085B30 R9 39FE1E00
C C R2 0 R10 AAAAAAAA
V _ R3 3C085AA8 R11 AAAAAAAA
Q _ R4 3C085A40 R12 FFFCD2C0
R5 3C052840 R13 3C085A30
0 _ R6 0 R14 3C00873D
1 _ R7 0 PC 3C00873C
2 _ SPSR 20000093 CPSR 600001F6
3 _
4 _ USR: FIQ:
R8 04033270 R8 0
R9 39FE1E00 R9 0
I I R10 AAAAAAAA R10 0
F F R11 AAAAAAAA R11 0
R12 FFFCD2C0 R12 0
T T R13 E5801000 R13 3C085A40
J _ R14 FFFFFF20 R14 0
mon SPSR 0
sec
SVC: IRQ:
A A R13 3C085A40 R13 3C085A40
E _ R14 3C001E0D R14 0
SPSR 0 SPSR 80
0 _
1 _ UND: ABT:
2 _ R13 3C057968 R13 3C057968
3 _ R14 0 R14 0
SPSR 0 SPSR 0
MON:
R13 3C085A30
R14 3C00873D
SPSR 20000093
-- ctx --
ctx = 0x3C085A40 -> (
sec = (
ub_regs = (
usr_sp = 0xE5801000,
usr_lr = 0xFFFFFF20,
irq_spsr = 0x80,
irq_sp = 0x3C085A40,
irq_lr = 0x0,
fiq_spsr = 0x0,
fiq_sp = 0x3C085A40,
fiq_lr = 0x0,
svc_spsr = 0x0,
svc_sp = 0x3C085A40,
svc_lr = 0x3C001E0D,
abt_spsr = 0x0,
abt_sp = 0x3C057968,
abt_lr = 0x0,
und_spsr = 0x0,
und_sp = 0x3C057968,
und_lr = 0x0,
pmcr = 0x41093021),
r0 = 0xBF00FF01,
r1 = 0x0,
r2 = 0x0,
r3 = 0x0,
r4 = 0x0,
r5 = 0x0,
r6 = 0x0,
r7 = 0x0,
mon_lr = 0x3C0002A0,
mon_spsr = 0x600001D3),
pad = 0xAAAAAAAA,
nsec = (
ub_regs = (
usr_sp = 0xAAAAAAAA,
usr_lr = 0xAAAAAAAA,
irq_spsr = 0xAA0AAAAA,
irq_sp = 0xAAAAAAAA,
irq_lr = 0xAAAAAAAA,
fiq_spsr = 0xAA0AAAAA,
fiq_sp = 0xAAAAAAAA,
fiq_lr = 0xAAAAAAAA,
svc_spsr = 0xAA0AAAAA,
svc_sp = 0x39FE1D20,
svc_lr = 0x040332A8,
abt_spsr = 0xAA0AAAAA,
abt_sp = 0xAAAAAAAA,
abt_lr = 0xAAAAAAAA,
und_spsr = 0xAA0AAAAA,
und_sp = 0xAAAAAAAA,
und_lr = 0xAAAAAAAA,
pmcr = 0x41093028),
r8 = 0x04033270,
r9 = 0x39FE1E00,
r10 = 0xAAAAAAAA,
r11 = 0xAAAAAAAA,
r12 = 0x39FE1D30,
r0 = 0xBF00FF01,
r1 = 0x0,
r2 = 0x0,
r3 = 0x0,
r4 = 0x0,
r5 = 0x0,
r6 = 0x0,
r7 = 0x0,
mon_lr = 0x040006C4,
mon_spsr = 0x20000093))
--- first thread_handle_fast_smc+0x4
-- registers
N _ R0 3C085A20 R8 04033270 ^S+ ^Stack_+
Z Z R1 0 R9 39FE1E00
C C R2 0 R10 AAAAAAAA
V _ R3 0 R11 AAAAAAAA
Q _ R4 3C085A20 R12 3C000474
R5 0 R13 3C085A18
0 _ R6 0 R14 3C000480
1 _ R7 0 PC 3C002F18
2 _ SPSR 0 CPSR 600001F3
3 _
4 _ USR: FIQ:
R8 04033270 R8 0
R9 39FE1E00 R9 0
I I R10 AAAAAAAA R10 0
F F R11 AAAAAAAA R11 0
R12 3C000474 R12 0
T T R13 E5801000 R13 3C085A40
J _ R14 FFFFFF20 R14 0
svc SPSR 0
sec
SVC: IRQ:
A A R13 3C085A18 R13 3C085A40
E _ R14 3C000480 R14 0
SPSR 0 SPSR 80
0 _
1 _ UND: ABT:
2 _ R13 3C057968 R13 3C057968
3 _ R14 0 R14 0
SPSR 0 SPSR 0
MON:
R13 3C085AB0
R14 3C00873D
SPSR 20000093
-- args
args = 0x3C085A20 -> (
a0 = 0xBF00FF01,
a1 = 0x0,
a2 = 0x0,
a3 = 0x0,
a4 = 0x0,
a5 = 0x0,
a6 = 0x0,
a7 = 0x0)
Now we are at the call causing the panic.
--- entry to \\tee\sm\sm_from_nsec+0x4
-- registers --
N _ R0 3C085A40 R8 3FFA5270 ^S+ ^Stack_+
Z _ R1 20 R9 3EB61EA0
C C R2 0 R10 1
V _ R3 0 R11 04000020
Q _ R4 3C085A40 R12 3EB5E440
R5 0 R13 3C085A30
0 _ R6 0 R14 3C008620
1 _ R7 0 PC 3C0086C8
2 _ SPSR 20000093 CPSR 200001F6
3 _
4 _ USR: FIQ:
R8 3FFA5270 R8 0
R9 3EB61EA0 R9 0
I I R10 1 R10 0
F F R11 04000020 R11 0
R12 3EB5E440 R12 0
T T R13 AAAAAAAA R13 AAAAAAAA
J _ R14 AAAAAAAA R14 AAAAAAAA
mon SPSR AA0AAAAA
sec
SVC: IRQ:
A A R13 3EB5E430 R13 AAAAAAAA
E _ R14 3FFA52A8 R14 AAAAAAAA
SPSR AA0AAAAA SPSR AA0AAAAA
0 _
1 _ UND: ABT:
2 _ R13 AAAAAAAA R13 AAAAAAAA
3 _ R14 AAAAAAAA R14 AAAAAAAA
SPSR AA0AAAAA SPSR AA0AAAAA
MON:
R13 3C085A30
R14 3C008620
SPSR 20000093
-- ctx
ctx = 0x3C085A40 -> (
sec = (
ub_regs = (
usr_sp = 0x0,
usr_lr = 0x0,
irq_spsr = 0x80,
irq_sp = 0x3C085A40,
irq_lr = 0x0,
fiq_spsr = 0x0,
fiq_sp = 0x3C085A40,
fiq_lr = 0x0,
svc_spsr = 0x0,
svc_sp = 0x3C085A40,
svc_lr = 0x3C001E0D,
abt_spsr = 0x60000113,
abt_sp = 0x3C057968,
abt_lr = 0x3C003314,
und_spsr = 0x0,
und_sp = 0x3C057968,
und_lr = 0x0,
pmcr = 0x41093021),
r0 = 0xBE000005,
r1 = 0x0,
r2 = 0x0,
r3 = 0x0,
r4 = 0x0,
r5 = 0x0,
r6 = 0x0,
r7 = 0x0,
mon_lr = 0x3C0032CC,
mon_spsr = 0x60000193),
pad = 0xAAAAAAAA,
nsec = (
ub_regs = (
usr_sp = 0xAAAAAAAA,
usr_lr = 0xAAAAAAAA,
irq_spsr = 0xAA0AAAAA,
irq_sp = 0xAAAAAAAA,
irq_lr = 0xAAAAAAAA,
fiq_spsr = 0xAA0AAAAA,
fiq_sp = 0xAAAAAAAA,
fiq_lr = 0xAAAAAAAA,
svc_spsr = 0xAA0AAAAA,
svc_sp = 0x39FE1B68,
svc_lr = 0x040332A8,
abt_spsr = 0xAA0AAAAA,
abt_sp = 0xAAAAAAAA,
abt_lr = 0xAAAAAAAA,
und_spsr = 0xAA0AAAAA,
und_sp = 0xAAAAAAAA,
und_lr = 0xAAAAAAAA,
pmcr = 0x41093028),
r8 = 0x3FFA5270,
r9 = 0x3EB61EA0,
r10 = 0x1,
r11 = 0x04000020,
r12 = 0x3EB5E440,
r0 = 0xBF00FF01,
r1 = 0x0,
r2 = 0x0,
r3 = 0x0,
r4 = 0x0,
r5 = 0x0,
r6 = 0x0,
r7 = 0x0,
mon_lr = 0x3FF726C4,
mon_spsr = 0x20000093))
--- \\tee\sm\sm_from_nsec+0x78
-- registers --
N _ R0 3C085A88 R8 3FFA5270 ^S+ ^Stack_+
Z Z R1 3C085B30 R9 3EB61EA0
C C R2 0 R10 1
V _ R3 3C085AA8 R11 04000020
Q _ R4 3C085A40 R12 FFFCD2C0
R5 3C052840 R13 3C085A30
0 _ R6 0 R14 3C00873D
1 _ R7 0 PC 3C00873C
2 _ SPSR 20000093 CPSR 600001F6
3 _
4 _ USR: FIQ:
R8 3FFA5270 R8 0
R9 3EB61EA0 R9 0
I I R10 1 R10 0
F F R11 04000020 R11 0
R12 FFFCD2C0 R12 0
T T R13 0 R13 3C085A40
J _ R14 0 R14 0
mon SPSR 0
sec
SVC: IRQ:
A A R13 3C085A40 R13 3C085A40
E _ R14 3C001E0D R14 0
SPSR 0 SPSR 80
0 _
1 _ UND: ABT:
2 _ R13 3C057968 R13 3C057968
3 _ R14 0 R14 3C003314
SPSR 0 SPSR 60000113
MON:
R13 3C085A30
R14 3C00873D
SPSR 20000093
-- ctx
ctx = 0x3C085A40 -> (
sec = (
ub_regs = (
usr_sp = 0x0,
usr_lr = 0x0,
irq_spsr = 0x80,
irq_sp = 0x3C085A40,
irq_lr = 0x0,
fiq_spsr = 0x0,
fiq_sp = 0x3C085A40,
fiq_lr = 0x0,
svc_spsr = 0x0,
svc_sp = 0x3C085A40,
svc_lr = 0x3C001E0D,
abt_spsr = 0x60000113,
abt_sp = 0x3C057968,
abt_lr = 0x3C003314,
und_spsr = 0x0,
und_sp = 0x3C057968,
und_lr = 0x0,
pmcr = 0x41093021),
r0 = 0xBF00FF01,
r1 = 0x0,
r2 = 0x0,
r3 = 0x0,
r4 = 0x0,
r5 = 0x0,
r6 = 0x0,
r7 = 0x0,
mon_lr = 0x3C0032CC,
mon_spsr = 0x60000193),
pad = 0xAAAAAAAA,
nsec = (
ub_regs = (
usr_sp = 0xAAAAAAAA,
usr_lr = 0xAAAAAAAA,
irq_spsr = 0xAA0AAAAA,
irq_sp = 0xAAAAAAAA,
irq_lr = 0xAAAAAAAA,
fiq_spsr = 0xAA0AAAAA,
fiq_sp = 0xAAAAAAAA,
fiq_lr = 0xAAAAAAAA,
svc_spsr = 0xAA0AAAAA,
svc_sp = 0x3EB5E430,
svc_lr = 0x3FFA52A8,
abt_spsr = 0xAA0AAAAA,
abt_sp = 0xAAAAAAAA,
abt_lr = 0xAAAAAAAA,
und_spsr = 0xAA0AAAAA,
und_sp = 0xAAAAAAAA,
und_lr = 0xAAAAAAAA,
pmcr = 0x41093028),
r8 = 0x3FFA5270,
r9 = 0x3EB61EA0,
r10 = 0x1,
r11 = 0x04000020,
r12 = 0x3EB5E440,
r0 = 0xBF00FF01,
r1 = 0x0,
r2 = 0x0,
r3 = 0x0,
r4 = 0x0,
r5 = 0x0,
r6 = 0x0,
r7 = 0x0,
mon_lr = 0x3FF726C4,
mon_spsr = 0x20000093))
--- thread_handle_fast_smc+0x4
-- registers
N _ R0 3C085A20 R8 3FFA5270 ^S+ ^Stack_+
Z Z R1 0 R9 3EB61EA0
C C R2 0 R10 1
V _ R3 0 R11 04000020
Q _ R4 3C085A20 R12 3C000474
R5 0 R13 3C085A18
0 _ R6 0 R14 3C000480
1 _ R7 0 PC 3C002F18
2 _ SPSR 0 CPSR 600001B3
3 _
4 _ USR: FIQ:
R8 3FFA5270 R8 0
R9 3EB61EA0 R9 0
I I R10 1 R10 0
F _ R11 04000020 R11 0
R12 3C000474 R12 0
T T R13 0 R13 3C085A40
J _ R14 0 R14 0
svc SPSR 0
sec
SVC: IRQ:
A A R13 3C085A18 R13 3C085A40
E _ R14 3C000480 R14 0
SPSR 0 SPSR 80
0 _
1 _ UND: ABT:
2 _ R13 3C057968 R13 3C057968
3 _ R14 0 R14 3C003314
SPSR 0 SPSR 60000113
MON:
R13 3C085AB0
R14 3C00873D
SPSR 20000093
-- args
args = 0x3C085A20 -> (
a0 = 0xBF00FF01,
a1 = 0x0,
a2 = 0x0,
a3 = 0x0,
a4 = 0x0,
a5 = 0x0,
a6 = 0x0,
a7 = 0x0)
We can see in last thread_handle_fast_smc+0x4
that 'F'IQ flag mask has been lost.
Can you spot something that is not normal in here?
When exiting monitor mode (as a return from exception) into secure world mon_spsr is restored into CPSR. If I recall correctly we should always enter in SVC mode with all exceptions masked.
The monitor doesn't care about this though, instead it just restores the state (with the exception of PC) we had when the last SMC was done to exit secure world. If you break just below bne .smc_from_nsec
and check for FIQ in SPSR you should be able to find out when the troublesome exit occurs.
If I understood correctly what you asked.
I have breakpoint now in:
First time there:
N _ R0 BE000005 R8 0 ^S+ ^Stack_+
Z Z R1 20 R9 39FE1E00
C C R2 E7F811E3 R10 AAAAAAAA
V _ R3 AF630002 R11 AAAAAAAA
Q _ R4 A5D5C51B R12 3C000474
R5 0 R13 3C085A88
0 _ R6 0 R14 3C00048C
1 _ R7 0 PC 3C0085B4
2 _ SPSR 600001D3 CPSR 600001D6
3 _
4 _ USR: FIQ:
R8 0 R8 0
R9 39FE1E00 R9 0
I I R10 AAAAAAAA R10 0
F F R11 AAAAAAAA R11 0
R12 3C000474 R12 0
T _ R13 E5801000 R13 3C085A40
J _ R14 FFFFFF20 R14 0
mon SPSR 0
sec
SVC: IRQ:
A A R13 3C085A40 R13 3C085A40
E _ R14 3C002F27 R14 0
SPSR 0 SPSR 80
0 _
1 _ UND: ABT:
2 _ R13 3C057968 R13 3C057968
3 _ R14 0 R14 0
SPSR 0 SPSR 0
MON:
R13 3C085A88
R14 3C00048C
SPSR 600001D3
Then on panic case we do not get on that break point as the call is from nsec
so it jump forward there.
And the exception is caused within that flow.
If I set break in here:
The first hit:
N _ R0 BF00FF01 R8 04033270 ^S+ ^Stack_+
Z _ R1 25 R9 39FE1E00
C C R2 0 R10 AAAAAAAA
V _ R3 0 R11 AAAAAAAA
Q _ R4 0 R12 39FE1D30
R5 0 R13 3C085B10
0 _ R6 0 R14 040006C4
1 _ R7 0 PC 3C008600
2 _ SPSR 20000093 CPSR 200001D6
3 _
4 _ USR: FIQ:
R8 04033270 R8 0
R9 39FE1E00 R9 0
I I R10 AAAAAAAA R10 0
F F R11 AAAAAAAA R11 0
R12 39FE1D30 R12 0
T _ R13 AAAAAAAA R13 AAAAAAAA
J _ R14 AAAAAAAA R14 AAAAAAAA
mon SPSR AA0AAAAA
nsec
SVC: IRQ:
A A R13 39FE1D20 R13 AAAAAAAA
E _ R14 040332A8 R14 AAAAAAAA
SPSR AA0AAAAA SPSR AA0AAAAA
0 _
1 _ UND: ABT:
2 _ R13 AAAAAAAA R13 AAAAAAAA
3 _ R14 AAAAAAAA R14 AAAAAAAA
SPSR AA0AAAAA SPSR AA0AAAAA
MON:
R13 3C085B10
R14 040006C4
SPSR 20000093
And then on panic flow:
N _ R0 BF00FF01 R8 3FFA5270 ^S+ ^Stack_+
Z _ R1 25 R9 3EB61EA0
C C R2 0 R10 1
V _ R3 0 R11 04000020
Q _ R4 0 R12 3EB5E440
R5 0 R13 3C085B10
0 _ R6 0 R14 3FF726C4
1 _ R7 0 PC 3C008600
2 _ SPSR 20000093 CPSR 200001D6
3 _
4 _ USR: FIQ:
R8 3FFA5270 R8 0
R9 3EB61EA0 R9 0
I I R10 1 R10 0
F F R11 04000020 R11 0
R12 3EB5E440 R12 0
T _ R13 AAAAAAAA R13 AAAAAAAA
J _ R14 AAAAAAAA R14 AAAAAAAA
mon SPSR AA0AAAAA
nsec
SVC: IRQ:
A A R13 3EB5E430 R13 AAAAAAAA
E _ R14 3FFA52A8 R14 AAAAAAAA
SPSR AA0AAAAA SPSR AA0AAAAA
0 _
1 _ UND: ABT:
2 _ R13 AAAAAAAA R13 AAAAAAAA
3 _ R14 AAAAAAAA R14 AAAAAAAA
SPSR AA0AAAAA SPSR AA0AAAAA
MON:
R13 3C085B10
R14 3FF726C4
SPSR 20000093
In both cases if we look at MON SPSR 0x20000093 is in both cases and in there FIQ mask is not set which is a bit weird because if I set new breakpoint:
I can see that MON SPSR is still the same and now actually when I step over rfefd sp!
I can see that FIQ mask is now lost.
Here is register dump on panic flow at entry of vector_fast_smc_entry
:
N _ R0 BF00FF01 R8 3FFA5270 ^S+ ^Stack_+
Z Z R1 0 R9 3EB61EA0
C C R2 0 R10 1
V _ R3 0 R11 04000020
Q _ R4 0 R12 FFFCD2C0
R5 0 R13 3C085A40
0 _ R6 0 R14 3C001E0D
1 _ R7 0 PC 3C00046C
2 _ SPSR 0 CPSR 60000193
3 _
4 _ USR: FIQ:
R8 3FFA5270 R8 0
R9 3EB61EA0 R9 0
I I R10 1 R10 0
F _ R11 04000020 R11 0
R12 FFFCD2C0 R12 0
T _ R13 0 R13 3C085A40
J _ R14 0 R14 0
svc SPSR 0
sec
SVC: IRQ:
A A R13 3C085A40 R13 3C085A40
E _ R14 3C001E0D R14 0
SPSR 0 SPSR 80
0 _
1 _ UND: ABT:
2 _ R13 3C057968 R13 3C057968
3 _ R14 0 R14 3C003314
SPSR 0 SPSR 60000113
MON:
R13 3C085AB0
R14 3C00873D
SPSR 20000093
And for comparison the first occurence at https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm_a32.S#L192:
N _ R0 1 R8 04033270 ^S+ ^Stack_+
Z _ R1 3C085B30 R9 39FE1E00
C C R2 0 R10 AAAAAAAA
V _ R3 0 R11 AAAAAAAA
Q _ R4 0 R12 FFFCD2C0
R5 0 R13 3C085A40
0 _ R6 0 R14 3C00873D
1 _ R7 0 PC 3C008628
2 _ SPSR 20000093 CPSR 200001D6
3 _
4 _ USR: FIQ:
R8 04033270 R8 0
R9 39FE1E00 R9 0
I I R10 AAAAAAAA R10 0
F F R11 AAAAAAAA R11 0
R12 FFFCD2C0 R12 0
T _ R13 E5801000 R13 3C085A40
J _ R14 FFFFFF20 R14 0
mon SPSR 0
sec
SVC: IRQ:
A A R13 3C085A40 R13 3C085A40
E _ R14 3C001E0D R14 0
SPSR 0 SPSR 80
0 _
1 _ UND: ABT:
2 _ R13 3C057968 R13 3C057968
3 _ R14 0 R14 0
SPSR 0 SPSR 0
MON:
R13 3C085A40
R14 3C00873D
SPSR 20000093
and then after step over rfefd sp!
I can see that FIQ mask is not lost.
Register dump entry of vector_fast_smc_entry
:
N _ R0 BF00FF01 R8 04033270 ^S+ ^Stack_+
Z Z R1 0 R9 39FE1E00
C C R2 0 R10 AAAAAAAA
V _ R3 0 R11 AAAAAAAA
Q _ R4 0 R12 FFFCD2C0
R5 0 R13 3C085A40
0 _ R6 0 R14 3C001E0D
1 _ R7 0 PC 3C00046C
2 _ SPSR 0 CPSR 600001D3
3 _
4 _ USR: FIQ:
R8 04033270 R8 0
R9 39FE1E00 R9 0
I I R10 AAAAAAAA R10 0
F F R11 AAAAAAAA R11 0
R12 FFFCD2C0 R12 0
T _ R13 E5801000 R13 3C085A40
J _ R14 FFFFFF20 R14 0
svc SPSR 0
sec
SVC: IRQ:
A A R13 3C085A40 R13 3C085A40
E _ R14 3C001E0D R14 0
SPSR 0 SPSR 80
0 _
1 _ UND: ABT:
2 _ R13 3C057968 R13 3C057968
3 _ R14 0 R14 0
SPSR 0 SPSR 0
MON:
R13 3C085AB0
R14 3C00873D
SPSR 20000093
One difference I noticed... ABT's R14 and SPSR got changed from zero to values.
After lunch I observe more ABT register value change flows.
Ok, now I believe I have understanding of the problem to some degree -- what is the right solution is another thing then.
ABT was false path -- this is related to OP-TEE's thread support.
@jenswi-linaro you were close on your idea where to observe but I needed a bit of additional reading of ARM reference manuals and such to understand it a bit better.
I believe the problem is related on side-effects of "stacking exceptions" with ctx->sec.mon_lr
this is saved in stack in struct sm_ctx *ctx
. And going back-and-forth in different call paths in secure monitor
we do get ctx->sec.mon_cpsr
to a state that F
IQ mask is not set.
Initially what happens during the boot is that u-boot
before relocation (in my case) goes to fetch information about OP-TEE environment with fast call
s. All is good and CPSR gets retained so that FIQ mask is set. During fast call
FIQ must be disabled and everything is good.
Next phase in u-boot
before relocation is that it goes to use SCMI
support to get information about clocks. Actual SCMI
calls are based on std call
s which then goes for OP-TEE's thread path. But what seems to be happening here is when PTA-SCMI
gets opened ctx->sec.mon_cpsr
gets new value of 0x60000193
which does not have F
IQ mask set. Now this CPSR value starts to get persisted and used when ever we make fast call
or std call
.
Now on second invocation to SCMI
(or any other std call
) we kinda get special treat when switching from monitor -> secure we get FIQ unmasked right away as a of new value of ctx->sec.mon_cpsr
includes that. Don't know if this is actually already a problem to enable FIQ's too early. There are several SCMI
calls in here continuing to use std call
.
Then what happens next in u-boot
is querying what Zynq-7000 CPU type it is with OEM extension as fast call. But that takes a shortcut and returns right away (SM_EXIT_TO_NON_SECURE
) before going too far in execution flow and we do not get any sanity checks with that.
Then u-boot
gets relocated and in driver model gets a bit confused -- this is a problem I need to figure out. But as a side effect u-boot
goes to detect OP-TEE again with standard fast call API's
. Now as in ctx->sec.mon_cpsr
we had previous left over path with FIQ enabled it gets inadvertedly restored also for fast call
. And then we get sanity checks at exit and get panic.
I would assume even with fixed u-boot
when I get far enough eventually booting to Linux kernel it would then latest make the same calls to query OP-TEE's details and we would get the same panic later in there.
So after following trace output FIQ mask is removed in ctx
:
I/TC: Primary CPU switching to normal world boot
D/TC:0 sm_platform_handler:453 sm_platform_handler, a0=0xbf00ff01
D/TC:0 sm_platform_handler:453 sm_platform_handler, a0=0xb2000001
D/TC:0 sm_platform_handler:453 sm_platform_handler, a0=0xbf00ff03
D/TC:0 sm_platform_handler:453 sm_platform_handler, a0=0xb2000009
D/TC:0 tee_entry_exchange_capabilities:100 Asynchronous notifications are disabled
D/TC:0 tee_entry_exchange_capabilities:109 Dynamic shared memory is enabled
D/TC:0 sm_platform_handler:453 sm_platform_handler, a0=0x32000004
D/TC:0 0 core_mmu_alloc_l2:272 L2 table used: 3/6
D/TC:? 0 tee_ta_init_pseudo_ta_session:296 Lookup pseudo TA a8cfe406-d4f5-4a2e-9f8d-a25dc754c099
D/TC:? 0 tee_ta_init_pseudo_ta_session:309 Open PTA-SCMI
D/TC:? 0 tee_ta_init_pseudo_ta_session:326 PTA-SCMI : a8cfe406-d4f5-4a2e-9f8d-a25dc754c099
Actual stack value change happens on line:
Just before that happens:
u-boot
from call path of opening session to PTA-SCMI
.Before stepping over srsdb sp!, #CPSR_MODE_MON
in registers we have:
N _ R0 BE000005 R8 3C0579A8 ^S+ ^Stack_+
Z Z R1 FFFF0000 R9 3
C C R2 A0 R10 A0
V _ R3 0 R11 FFFF0000
Q _ R4 0 R12 FFFCD2C0
R5 3C082FF8 R13 3C085AB0
0 _ R6 800001D3 R14 3C003310
1 _ R7 0 PC 3C00859C
2 _ SPSR 60000193 CPSR 600001D6
3 _
4 _ USR: FIQ:
R8 3C0579A8 R8 0
R9 3 R9 0
I I R10 A0 R10 0
F F R11 FFFF0000 R11 0
R12 FFFCD2C0 R12 0
T _ R13 0 R13 3C085A40
J _ R14 0 R14 0
mon SPSR 0
sec
SVC: IRQ:
A A R13 3C085A40 R13 3C085A40
E _ R14 3C001E0D R14 0
SPSR 0 SPSR 80
0 _
1 _ UND: ABT:
2 _ R13 3C057968 R13 3C057968
3 _ R14 0 R14 3C003294
SPSR 0 SPSR 0193
MON:
R13 3C085AB0
R14 3C003310
SPSR 60000193
Apparently here we do get value of MON's LR
and SPSR
to the stack and it just happened to point to right spot in there.
(sm_ctx *)0x3C085A40 = 0x3C085A40 -> (
sec = (
ub_regs = (usr_sp = 0xE5801000, usr_lr = 0xFFFFFF
r0 = 0x32000004,
r1 = 0x0,
r2 = 0x39FE3000,
r3 = 0x0,
r4 = 0x0,
r5 = 0x0,
r6 = 0x0,
r7 = 0x0,
mon_lr = 0x3C003310,
mon_spsr = 0x60000193),
pad = 0xAAAAAAAA,
nsec = (ub_regs = (usr_sp = 0xAAAAAAAA, usr_lr = 0x
Now we have mon_spsr = 0x60000193
that poisons rest of the flows.
Tested that panic goes away with following workaround:
diff --git a/core/arch/arm/sm/sm.c b/core/arch/arm/sm/sm.c
index b4b7be43b..10c2813f4 100644
--- a/core/arch/arm/sm/sm.c
+++ b/core/arch/arm/sm/sm.c
@@ -89,6 +89,7 @@ uint32_t sm_from_nsec(struct sm_ctx *ctx)
sm_restore_unbanked_regs(&ctx->sec.ub_regs);
memcpy(&ctx->sec.r0, args, sizeof(*args));
+ ctx->sec.mon_spsr |= CPSR_F | CPSR_I;
if (OPTEE_SMC_IS_FAST_CALL(ctx->sec.r0))
ctx->sec.mon_lr = (uint32_t)vector_fast_smc_entry;
else
Even with the patch I can seen that value of ctx->sec.mon_spsr
gets into FIQ not masked state quite often but that workaround keeps problem at bay.
To me this looks like generic issue in OP-TEE and not related to Zynq-7000 as such. I just happened to have code flows that triggered it.
Now any better ideas what should be fixed and where 🤔 ?
Initially what happens during the boot is that u-boot before relocation (in my case) goes to fetch information about OP-TEE environment with fast calls. All is good and CPSR gets retained so that FIQ mask is set. During fast call FIQ must be disabled and everything is good.
So far so good.
Next phase in u-boot before relocation is that it goes to use SCMI support to get information about clocks. Actual SCMI calls are based on std calls which then goes for OP-TEE's thread path. But what seems to be happening here is when PTA-SCMI gets opened ctx->sec.mon_cpsr gets new value of 0x60000193 which does not have FIQ mask set. Now this CPSR value starts to get persisted and used when ever we make fast call or std call.
This is the root of the problem where it needs to be fixed. OP-TEE should never use the temporary stack while not having exceptions masked. The temporary stack is used for exception processing (except abort) and to enter/exit OP-TEE.
This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.
keep-alive
This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.
keep-alive
@vesajaaskelainen what is the status of this issue? My understanding is that your are investigating a potential bug in u-boot?
@vesajaaskelainen what is the status of this issue? My understanding is that your are investigating a potential bug in u-boot?
There is a bug in OP-TEE's secure monitor and I seem to be first one that actually went deeper to figure out what is happening (there were similar issues reported earlier that might have same problem origin).
Problem goes deep in "context" stacking when changing different security modes so it is not so clear picture what is the best fix for it.
But basically one would need to have clear return paths in stack but currently it re-uses memory in the stack when switching security states which ultimately causes the wrong state being restored and then further used in successive calls.
If you have energy to debug the issue I am happy to assist -- I am currently using the "ugly" workaround commented above so I could get forward.
I have been lately working with mpsoc and now getting back to zynq7000.
@vesajaaskelainen what is the status of this issue? My understanding is that your are investigating a potential bug in u-boot?
There is a bug in OP-TEE's secure monitor and I seem to be first one that actually went deeper to figure out what is happening (there were similar issues reported earlier that might have same problem origin).
I'm not so sure about that. From what I've seen it seems that there's a fast call that unmasks FIQs.
This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.
keep-alive
Now I rebased my code on top of master and finally I have Xilinx Zynq-7000 booting into the Linux and also xtest
is running fine in there. Some OP-TEE drivers need to be developed a bit more but starting to look good.
Before getting too far in cleaning up debugs and other stuff from the system I decided to revert the hack I had in place.
But still get:
E/TC:0 assertion 'thread_get_exceptions() == THREAD_EXCP_ALL' failed at core/arch/arm/kernel/thread_optee_smc.c:50 <thread_handle_fast_smc>
So need to dig a bit deeper in the logic behind the scenes in here.
Apparently I need to present the findings somehow and it probably is related to presenting how stack is evolving and so on going to deeper details. Would private presentation be preferable for this before feeding the tracker with more details :) ?
Hi @vesajaaskelainen,
Apparently I need to present the findings somehow and it probably is related to presenting how stack is evolving and so on going to deeper details. Would private presentation be preferable for this before feeding the tracker with more details :) ?
Perhaps it would make a good topic for the next Linaro OP-TEE Contributions (LOC) meeting on Aug 25th (see https://www.trustedfirmware.org/meetings/ and https://bit.ly/loc-notes), what do you think?
Hi @vesajaaskelainen,
Apparently I need to present the findings somehow and it probably is related to presenting how stack is evolving and so on going to deeper details. Would private presentation be preferable for this before feeding the tracker with more details :) ?
Perhaps it would make a good topic for the next Linaro OP-TEE Contributions (LOC) meeting on Aug 25th (see https://www.trustedfirmware.org/meetings/ and https://bit.ly/loc-notes), what do you think?
I'll try to be ready with the presentation for that meeting. So lets book it.
I could also give a status update on improved zynq7k support and then perhaps discuss some topics on that too.
@vesajaaskelainen great, thanks. Agenda updated in https://bit.ly/loc-notes.
@jforissier umh meeting was closed before it was started?
In the time table it says 16:00 and all times in UK so that would be 18:00 in Finnish time? (eg. starting in 15 minutes)
@jforissier umh meeting was closed before it was started?
In the time table it says 16:00 and all times in UK so that would be 18:00 in Finnish time? (eg. starting in 15 minutes)
My bad. I was 1h early it seems! Sorry about that.
@jforissier umh meeting was closed before it was started? In the time table it says 16:00 and all times in UK so that would be 18:00 in Finnish time? (eg. starting in 15 minutes)
My bad. I was 1h early it seems! Sorry about that.
Ok. I am in the call :)
This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.
keep-alive
This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.
I have been working on refurbishing Xilinx Zynq-7000 SoC (Cortex-A9 x 2) support in OP-TEE to make it possible to boot u-boot and Linux kernel. OP-TEE OS is from tag 3.16.0 + changes.
With the work I have started using SCMI support that @etienne-lms has been introducing in different components.
I have implemented clock driver for zynq7k and several fast smc calls for quick path to query certain hardware details and reworked zynq7k support altogether.
System works so that U-Boot's SPL loads OP-TEE and u-boot proper and then SPL boots into OP-TEE and then when it is done then starts executing u-boot proper and then u-boot's TEE queries with fast smc calls features supported and then switches to SCMI support to fetch information about what clocks are there with std smc calls and to SCMI server and then u-boot gets relocated and then it goes to ask again what features are supported but that fails with:
I have been looking with the Trace32 what could be the issue and currently it looks like that effect starts from
sm_from_nsec()
:https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm.c#L93
and then later in:
https://github.com/OP-TEE/optee_os/blob/master/core/arch/arm/sm/sm_a32.S#L196
actually causes FIQ bit to be unmasked.
and then actual operation is performed but thread exception check happens and causes panic.
Seems that contents of
struct sm_ctx *ctx
is wrong.I am a bit puzzled what would be the right course of action as there seems to be quite a bit in play in here.
While stepping in secure monitor I can see that in some cases the FIQ bit is not set and my understanding is that if there would be IRQ handler for secure peripheral then FIQ would be used to signal it.
Any ideas what to try or look?
I have added some debug prints to figure out what kind of SMC calls we have.