FrameworkComputer / EmbeddedController

Embedded Controller firmware for the Framework Laptop
BSD 3-Clause "New" or "Revised" License
957 stars 64 forks source link

WARNING: Compiling with GCC 11+ will result in a broken EC and potentially bricked laptop #21

Closed DHowett closed 1 year ago

DHowett commented 1 year ago

A firmware image built with GCC 11 or higher will fail to start the AP due to a watchdog timeout processing an ESPI interrupt.

It may be worthwhile to add a note somewhere to this effect.

Why?

The loop here ... https://github.com/FrameworkComputer/EmbeddedController/blob/c1d06ea51e1c4021392f1c3450fbf4701cb85c74/chip/mchp/espi.c#L1099-L1106

... iterates on the set bits in girq24_result (there is a similar loop below for girq25_result) by using __builtin_ctz to avoid checking each bit.

This compiles into rbit and clz.

clz is specified as returning 32 when no bits are set, e.g. when the value is 0.

In contrast, __builtin_c[lt]z are specified as undefined when the value is 0.

In short, it's relying on undefined behavior!

In GCC 10, we got by with a pass; the code compiled into rbit and clz, the comparison was kept, and everything chugged along happily.

With GCC 11, the optimizer takes advantage of this undefined behavior to remove the comparison (after all: there are not 33 bits in a uint32_t, and the return value after we've cleared all the bits, i.e. x==0, is undefined).

Compiled code for espi_mswv1_interrupt, lightly annotated ``` 000e7e90 : e7e90: e92d 41f0 stmdb sp!, {r4, r5, r6, r7, r8, lr} e7e94: 4b12 ldr r3, [pc, #72] ; (e7ee0 ) e7e96: 4f13 ldr r7, [pc, #76] ; (e7ee4 ) e7e98: f8d3 2144 ldr.w r2, [r3, #324] ; 0x144 e7e9c: f8d3 5148 ldr.w r5, [r3, #328] ; 0x148 e7ea0: 4e11 ldr r6, [pc, #68] ; (e7ee8 ) e7ea2: f8c3 5140 str.w r5, [r3, #320] ; 0x140 e7ea6: fa95 f4a5 rbit r4, r5 e7eaa: fab4 f484 clz r4, r4 e7eae: f04f 080c mov.w r8, #12 // The comparison would fall roughly here; in GCC 10, it does: ///#####: 2c20 cmp r4, #32 ///#####: d101 bne.n ##### ///#####: e8bd 81f0 ldmia.w sp!, {r4, r5, r6, r7, r8, pc} e7eb2: 08a2 lsrs r2, r4, #2 e7eb4: f004 0303 and.w r3, r4, #3 e7eb8: fb08 3302 mla r3, r8, r2, r3 e7ebc: 4621 mov r1, r4 e7ebe: 5dd8 ldrb r0, [r3, r7] e7ec0: eb06 0384 add.w r3, r6, r4, lsl #2 e7ec4: f000 0001 and.w r0, r0, #1 e7ec8: f8d3 30b8 ldr.w r3, [r3, #184] ; 0xb8 e7ecc: 4798 blx r3 e7ece: 2301 movs r3, #1 e7ed0: 40a3 lsls r3, r4 e7ed2: ea25 0503 bic.w r5, r5, r3 e7ed6: fa95 f4a5 rbit r4, r5 e7eda: fab4 f484 clz r4, r4 e7ede: e7e8 b.n e7eb2 e7ee0: 4000e000 .word 0x4000e000 e7ee4: 400f9c08 .word 0x400f9c08 e7ee8: 000fcb70 .word 0x000fcb70 ```

This results in an infinite loop with bpos == 32, and the following message printed to the console until the watchdog kicks us out:

Unknown M2S VW: state=0 GIRQ25 bitpos=32
                                      ^^ oops

Note I know that this code originated in the upstream repository, and I have already notified a couple folks over there! :smile:

DHowett commented 1 year ago

This is no longer true on the hx20-hx30 branch, so I'll close it out. Thanks for merging!

JohnAZoidberg commented 1 year ago

Thanks! Changes are merged. Next release (hx20 3.19, hx30 3.07) will include them.