Dasharo / dasharo-issues

The Dasharo issue tracker
https://dasharo.com/
25 stars 0 forks source link

Incident: laptop bricks upon efibootmgr command (V540TU) #1099

Open wessel-novacustom opened 1 month ago

wessel-novacustom commented 1 month ago

Component

Dasharo firmware, EC firmware

Device

NovaCustom V54 14th Gen

Dasharo version

v0.9.0

Dasharo Tools Suite version

No response

Test case ID

No response

Brief summary

While running an efibootmgr command, the laptop has bricked; does not turn on anymore.

How reproducible

~ 20% reproducible.

How to reproduce

It happened in the last step of installation when the installation script runs a command like this:

efibootmgr -c -d /dev/nvme0n1 -p 1 -L <label> -l \EFI\...\grubx64.efi

Expected behavior

Laptop is proceeding with finalising the command without bricking.

Actual behavior

The screen frozen exactly in that moment and the laptop didn't react to any buttons. After holding power button to power off / reset, the laptop didn't turn on anymore. Now when I power it on the keyboard and screen backlights goes on but without the NovaCustom logo - it stays black. Charging also works (LEDs change colors) but the laptop doesn't turn on...

Screenshots

No response

Additional context

A question to the engineers. Any ideas how this could have happened and how we can prevent this from happening in the future? Or do you think it's a hardware issue?

Solutions you've tried

No response

mkopec commented 1 month ago

Can you recover the exact efibootmgr command that was used? Which OS is that?

wessel-novacustom commented 1 month ago

Can you recover the exact efibootmgr command that was used? Which OS is that?

My customer will elaborate soon.

rudolfvesely commented 1 month ago

Hi @wessel-novacustom and @mkopec ,

This problem occurred twice, in both cases during the installation of NixOS, but I don't think the problem is related to Nix.

First, it happened at the last stage of the installation, when Grub was installed and a UEFI boot entry was added. That was with the RC version of Dasharo for V54. The laptop hung (the screen froze), but after holding the power button, it worked fine.

The second time it happened when deleting and adding UEFI boot entries using efibootmgr. In this case that was with v0.9.0 and the laptop didn't boot after holding the power button. When I push the power button, I can see the screen backlight going on and LEDs indicate charging/not charging (the laptop is not completely bricked) but the laptop doesn't boot and the screen stays black. In this case, it happened again after the installation of NixOS but I'm not going to include the installation steps since in my opinion that would be waste of time for the individual trying to debug this problem.

It's worth to mention that other people and myself installed NixOS several times and it worked fine in 99% of cases so maybe this is a race condition?

My suggestion to debug and possibly trigger the issue is to DD this ISO to a USB drive and boot from it:

https://releases.nixos.org/nixos/24.05/nixos-24.05.3914.c3d4ac725177/nixos-plasma6-24.05.3914.c3d4ac725177-x86_64-linux.iso

Ignore Plasma login and Ctrl+Alt+F1. Then as root sudo -i run the following:

while true; do
efibootmgr | grep -E '^Boot[0-9A-Z]{4}\*|  ' | sed -E 's/^Boot([0-9A-Z]{4})(\*| ) .+/\1/' | xargs -rn1 efibootmgr -B -b
efibootmgr -c -d /dev/nvme0n1 -p 1 -L mynix001 -l '\EFI\NixOS-boot-efi\grubx64.efi'
sleep 1
done

I ran it once outside of the while loop and it worked fine. For obvious reasons bricking another laptop, I didn't try the loop version. If you want to try the code above, please make sure you can recover the laptop (possibly by CMOS reset or UEFI ROM reflash).

Thank you.

Rudolf

rudolfvesely commented 1 month ago

The firmware configuration when it happened:

[ ] Lock the BIOS boot media
[ ] Enable SMM BIOS write protection
[ ] Early boot DMA Protection
[x] Enable Wi-Fi + BT radios
[x] Enable Camera

Intel ME mode: Disabled (HAP)
MatXron commented 1 month ago

I can confirm that there is an issue that happens at the end of Linux installation or during a cold boot. I did some distro hopping and the laptop froze on me twice at the end of the installation. Fortunately, I didn't brick it and it works OK after power off/on but if I remember correctly the UEFI was reset to factory defaults which was strange...

The second problem is that the UEFI lost boot entries twice after a cold boot. The first time I thought that I did something wrong but then it happened this morning again and I was able to observe how it happened. I cold started the laptop (pushed the power button), and the button light up but I didn't get the NovaCustom logo immediately. Instead of that the button light went off and then on again and I could finally see the NovaCustom logo. But it booted into the EFI shell...

After some investigation, I found that all UEFI entries were wiped off and all settings were reset to factory default which was very obvious since I set UEFI admin password and the password wasn't there anymore.

The fix was simple - I had to boot live USB and use efibootmgr to add missing entry.

As mentioned above this is likely some race condition since I cold boot the laptop often (since it doesn't sleep with kernel 6.10) and it happened only twice.

Dasharo Security Options:
    [x] Lock the BIOS boot media
    [x] Enable SMM BIOS write protection
    [x] Early boot DMA Protection
    [ ] Keep IOMMU enabled when transfer control to OS
    [x] Enable Wi-Fi + BT radios
    [x] Enable Camera

USB Configuration:
    [ ] Enable USB stack

Intel Management Engine Options:
    Intel ME mode: Disabled (HAP)
wessel-novacustom commented 4 weeks ago

The same incident has happened on two laptops with the same configuration of @rudolfvesely.

To avoid confusing, to brick means that the laptop does turn on, but never reaches the UEFI boot splash. The laptop still turns on, but isn't booting. Fn + F2 still turns off the screen display and back on, indicating that this is clearly a firmware bug IMO.

On both units, I will now try to restore the firmware to the original state. I assume it will restore the laptop in a good state, I will confirm later today.

CC for thoughts: @mkopec @macpijan @miczyg1

wessel-novacustom commented 4 weeks ago

I will now try to restore the firmware to the original state. I assume it will restore the laptop in a good state, I will confirm later today.

wp-tooling commented 3 weeks ago

After some investigation, I found that all UEFI entries were wiped off and all settings were reset to factory default which was very obvious since I set UEFI admin password and the password wasn't there anymore.

Confirmed, this is a problem:

https://youtu.be/_FeDZ4PKHNE

  1. You can see - sometime the laptop doesn't boot, and it is stuck on the boot logo
  2. When powered off and on - you case see that it starts (0:23) (keyboard backlight goes on), then it resets (0:25) (keyboard backlight goes off and on again)
  3. After that it can't boot since the firmware settings including firmware password and boot entries are wiped