FrameworkComputer / EmbeddedController

Embedded Controller firmware for the Framework Laptop
BSD 3-Clause "New" or "Revised" License
950 stars 64 forks source link

hx20: restore RW firmware access #16

Open DHowett opened 2 years ago

DHowett commented 2 years ago

This pull request comprises three bug fixes.

1. lfw: restore RW firmware access by using the right pin mux

The alternate function table for LFW on the hx20 contained a few incorrect values that resulted in LFW being unable to read from flash, which disabled the use of the RW (and in some cases the RO) firmware image.

  1. SHD_IO0 is function 1 on the MEC152x, not function 2.
  2. SHD_IO2 and SHD_IO3 were not included in the alternate function table.

The alternate function table from hx20/gpio.inc is correct, but it is missing the SPI alternate for SHD_GLK/GPIO_0056.

In the future, we may want to move to model where hx20/gpio.inc includes hx20/lfw/gpio.inc and common pin configurations are shared.

2. Remember whether we cut power to ourselves, and re-check during boot

On the hx20 (and presumably hx30,) a design issue prevents us from hibernating the EC properly. Therefore, every time we bring the machine down we cut power instead of hibernating. That results in the next boot being a complete reset.

There is code in lfw that checks whether the current boot is due to a watchdog reset or a power-on reset and if it is, clears the image type back to EC_IMAGE_UNKNOWN. Due to that design issue, the hx20 EC is always in POR/VTR or WDT on startup.

By storing whether the last shutdown was graceful/intended and checking it before resetting the image type, we can work around this issue.

3. Make sure we fire SHUTDOWN_COMPLETE hooks when the AP goes down

This ensures that the hook in common/system.c that triggers an "at-shutdown" reboot from RO to RW fires.

The x86 power sequencer in $/power fires this hook, but we don't.

Without this, ectool reboot_ec RW at-shutdown doesn't work.

4. mchp/lfw: make sure we keep the EC on for long operations

Over the past six months with this patch set, I've observed one troubling behavior: the laptop would not always boot when it was powered completely off (with the EC off as well) and I clicked the power button.

It turns out that the power buttons control VCI_IN[01] and that the VBAT-powered control interface is configured to drive GPIO250 as VCI_OUT based on the status of those inputs. Now, GPIO250 is also known as EC_ON on hx20/30: it controls whether 3VL_EC is on.

The MEC1521 manual recommends the following example of using VCI_OUT on a "mobile platform" (abbreviated):

  1. A coin cell battery is installed, powering VBAT
  2. The power button on VCI_IN0# is pushed causing VCI_OUT to be asserted, powering the VTR rail
  3. The EC reconfigures VCI so that firmware controls the VCI_OUT pin.

Now, the EC is doing this over in vci_task (hx20/hx30), which is long after LFW has started and jumped to main firmware. The problem is, restoring access to RO and RW images in the prior three commits causes us to try to read from flash on startup.

Reading from flash is slightly slower, so we extend the period of time between step 2 and 3 where 3VL_EC is only driven by VCI_IN0#. When the user releases the power button during that window, 3VL_EC is cut and the EC shuts down.

This patch fixes the issue by asserting firmware control of VCI_OUT as early as is safe in lfw.

DHowett commented 2 years ago

I have only end-to-end tested this a couple times, but I haven't done so from a cold start. I'll make sure that works and report back.

DHowett commented 2 years ago

Ah, cold boot to RW doesn't work because we always get VTR POR. I'm guessing it's because of this:

https://github.com/FrameworkComputer/EmbeddedController/blob/9f8cf536bdb0a843c0e9aa516feedf81b2cbab10/board/hx20/board.c#L393-L398

We reset the image type here:

https://github.com/FrameworkComputer/EmbeddedController/blob/9f8cf536bdb0a843c0e9aa516feedf81b2cbab10/chip/mchp/lfw/ec_lfw.c#L332-L336

Is this true of all hx20 boards, or just DVT/EVT? If so, RW may not be possible as specced. I believe there's a way we can work around it, if you're amenable: Store something in VBAT RAM indicating that our last powerdown was self-initiated, and check it in lfw!system_init

DHowett commented 2 years ago

Here's my proposal:

diff --git a/board/hx20/board.c b/board/hx20/board.c
index 3e6bf52306..d11c7fb162 100644
--- a/board/hx20/board.c
+++ b/board/hx20/board.c
@@ -410,6 +410,8 @@ static void board_power_off_deferred(void)
                task_clear_pending_irq(i);
        }

+       MCHP_VBAT_RAM(MCHP_IMAGETYPE_IDX) |= 0x80;
+
        MCHP_VCI_NEGEDGE_DETECT = BIT(0) |  BIT(1);
        MCHP_VCI_POSEDGE_DETECT = BIT(0) |  BIT(1);

diff --git a/chip/mchp/lfw/ec_lfw.c b/chip/mchp/lfw/ec_lfw.c
index 61cfc293fe..e3c01cc953 100644
--- a/chip/mchp/lfw/ec_lfw.c
+++ b/chip/mchp/lfw/ec_lfw.c
@@ -338,14 +338,24 @@ void system_init(void)
        uint32_t wdt_sts = MCHP_VBAT_STS & MCHP_VBAT_STS_ANY_RST;
        uint32_t rst_sts = MCHP_PCR_PWR_RST_STS &
                                MCHP_PWR_RST_STS_VTR;
+       // **HX20**: We can't hibernate the EC without also keeping
+       // 5v3v ALW on, so we cut power entirely. Unfortunately,
+       // that means that one of rst_sts or wdt_sts will always be
+       // on... and that precludes the use of the RW firmware.
+       // However, if we store a bit in IMAGETYPE to indicate that
+       // we cut power to ourselves, we can use it at the next boot
+       // to determine whether this poweroff was EC-origin or not.
+       bool wacked = (MCHP_VBAT_RAM(MCHP_IMAGETYPE_IDX) & 0x80) != 0

        trace12(0, LFW, 0,
                "VBAT_STS = 0x%08x  PCR_PWR_RST_STS = 0x%08x",
                wdt_sts, rst_sts);

-       if (rst_sts || wdt_sts)
+       if ((rst_sts || wdt_sts) && !wacked)
                MCHP_VBAT_RAM(MCHP_IMAGETYPE_IDX)
                                        = EC_IMAGE_UNKNOWN;
+
+       MCHP_VBAT_RAM(MCHP_IMAGETYPE_IDX) &= 0x7F;
 }

 enum ec_image system_get_image_copy(void)

I've got this in testing right now.

DHowett commented 2 years ago

I've updated this patch set with my tested and working proposal. It now properly (1) reboots into RW, (2) boots into RW from cold-off, and (3) can reboot into RO/RW at-shutdown

DHowett commented 1 year ago

I'm marking this one as "ready", and I can retarget the branch if you'd like. I've been running it for almost 6 months now, and I've only observed one issue. I haven't been on the stock firmware enough to tell you whether this also reproduces over there, but I will flash back and test it out.

The issue: sometimes when the EC powers off entirely because the machine is on battery power, it doesn't boot up properly when I click the FP power key. It starts, loads lfw, goes to load RO or RW, and crashes again. Then it falls back to the default image after I click the power button again.

The only thing in common is that both RO and RW in LFW require reading the flash again after boot. The default/system image was already loaded by the boot ROM, and doesn't seem to cause this issue.

I'm wondering if we don't get enough time or power or something to fully set up the SPI subsystem since we're waking on VCI. Potential validation proof would be to trip the chassis switch in the same situation.

EDIT: yeah, if this impacts RO it's not right for me to move it out of draft status. :smile:

github-actions[bot] commented 1 year ago

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

DHowett commented 1 year ago

Ah, interesting. The CLA bot doesn't handle merge commits very well...

DHowett commented 1 year ago

Nailed it. Here's the explanation -

mchp/lfw: make sure we keep the EC on for long operations

Over the past six months with this patch set, I've observed one troubling behavior: the laptop would not always boot when it was powered completely off (with the EC off as well) and I clicked the power button.

It turns out that the power buttons control VCI_IN[01] and that the VBAT-powered control interface is configured to drive GPIO250 as VCI_OUT based on the status of those inputs. Now, GPIO250 is also known as EC_ON on hx20/30: it controls whether 3VL_EC is on.

The MEC1521 manual recommends the following example of using VCI_OUT on a "mobile platform" (abbreviated):

  1. A coin cell battery is installed, powering VBAT
  2. The power button on VCI_IN0# is pushed causing VCI_OUT to be asserted, powering the VTR rail
  3. The EC reconfigures VCI so that firmware controls the VCI_OUT pin.

Now, the EC is doing this over in vci_task (hx20/hx30), which is long after LFW has started and jumped to main firmware. The problem is, restoring access to RO and RW images in the prior three commits causes us to try to read from flash on startup.

Reading from flash is slightly slower, so we extend the period of time between step 2 and 3 where 3VL_EC is only driven by VCI_IN0#. When the user releases the power button during that window, 3VL_EC is cut and the EC shuts down.

This patch fixes the issue by asserting firmware control of VCI_OUT as early as is safe in lfw.