getsolus / packages

Solus Package Monorepo & Issue Tracker
62 stars 78 forks source link

Unable to boot after week 41 update #583

Closed stephanedr closed 10 months ago

stephanedr commented 11 months ago

Summary

The reboot froze after week 41 update. Removing the packages broadcom-sta* fixes the issue (but no more WiFi...).

I've tried to reinstall them after a successful boot, but the reboot still fails.

Steps to reproduce

Apply the update.

Expected result

To boot.

Actual result

Boot freezes.

Environment

Repo

Shannon (stable)

Desktop Environment

MATE

System details

from inxi -b

System:
  Host: dell5720 Kernel: 6.5.7-259.current arch: x86_64 bits: 64 Desktop: MATE
    v: 1.27.3 Distro: Solus 4.4 harmony
Machine:
  Type: Portable System: Dell product: Inspiron 5720 v: N/A
    serial: <superuser required>
  Mobo: Dell model: 0JVJ94 v: A00 serial: <superuser required>
    UEFI-[Legacy]: Dell v: A19 date: 04/18/2014
Battery:
  ID-1: BAT0 charge: 7.3 Wh (100.0%) condition: 7.3/48.8 Wh (14.9%)
CPU:
  Info: dual core Intel Core i5-3210M [MT MCP] speed (MHz): avg: 1198
    min/max: 1200/3100
Graphics:
  Device-1: Intel 3rd Gen Core processor Graphics driver: i915 v: kernel
  Device-2: NVIDIA GF117M [GeForce 610M/710M/810M/820M / GT
    620M/625M/630M/720M] driver: nouveau v: kernel
  Device-3: Microdia Laptop_Integrated_Webcam_HD driver: N/A type: USB
  Display: x11 server: X.Org v: 1.20.14 with: Xwayland v: 23.1.2 driver: X:
    loaded: modesetting,nouveau unloaded: fbdev,vesa dri: crocus,nouveau
    gpu: i915 resolution: 1600x900~60Hz
  API: OpenGL v: 4.5 compat-v: 4.2 vendor: intel mesa v: 23.1.9
    renderer: Mesa Intel HD Graphics 4000 (IVB GT2)
Network:
  Device-1: Broadcom BCM43142 802.11b/g/n driver: bcma-pci-bridge
  Device-2: Realtek RTL810xE PCI Express Fast Ethernet driver: r8169

Other comments

No response

ReillyBrogan commented 10 months ago

I'm going to prep some debug kernels for you to test, can you please confirm what was the last working kernel for you?

ReillyBrogan commented 10 months ago

So I built a bunch of test kernels for testing with various changes reverted. Please try EACH set of kernels and report which ones, if any, work.

SET 1. This kernel set downgrades the kernel to 6.5.5, and changes the kernel config to match what it was at that version. If this one doesn't work then it indicates that there is probably an issue with toolchain or other package changes we made.

sudo eopkg dc
sudo eopkg it https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config-version/broadcom-sta-common-6.30.223.271-372-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config-version/broadcom-sta-current-6.30.223.271-372-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config-version/linux-current-6.5.9-263-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config-version/linux-current-headers-6.5.9-263-1-x86_64.eopkg
sudo crl-boot-manager update

SET 2. This set uses the same kernel version and config as set 1, however it disables strip for the broadcom modules.

sudo eopkg dc
sudo eopkg it https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config-version-no-strip/broadcom-sta-common-6.30.223.271-372-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config-version-no-strip/broadcom-sta-current-6.30.223.271-372-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config-version-no-strip/linux-current-6.5.9-263-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config-version-no-strip/linux-current-headers-6.5.9-263-1-x86_64.eopkg
sudo crl-boot-manager update

SET 3. This updates to kernel 6.5.9, but builds with the same kernel config as set 1/2. If this succeeds then it indicates that changes to the kernel config are what broke the broadcom modules

sudo eopkg dc
sudo eopkg it https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config/broadcom-sta-common-6.30.223.271-372-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config/broadcom-sta-current-6.30.223.271-372-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config/linux-current-6.5.9-263-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-config/linux-current-headers-6.5.9-263-1-x86_64.eopkg
sudo crl-boot-manager update

SET 4. This uses kernel 6.5.5 but updates the kernel config to match recent changes. If this succeeds then it indicates that the kernel version update itself is at fault.

sudo eopkg dc
sudo eopkg it https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-version/broadcom-sta-common-6.30.223.271-372-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-version/broadcom-sta-current-6.30.223.271-372-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-version/linux-current-6.5.9-263-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/revert-version/linux-current-headers-6.5.9-263-1-x86_64.eopkg
sudo crl-boot-manager update
stephanedr commented 10 months ago

My system was up-to-date before W41 (I don't know which kernel version it was). The 4 sets failed. (a typo at the last line of each set: "clr" instead of "crl").

ReillyBrogan commented 10 months ago

What is the output of eopkg li | grep libxcrypt?

ReillyBrogan commented 10 months ago

If that does not show libxcrypt-compat installed, could you try installing it and see if it changes anything?

ReillyBrogan commented 10 months ago

The next thing I'd like you to try if that doesn't work is to downgrade kmod: sudo eopkg it https://cdn.getsol.us/repo/shannon/k/kmod/kmod-30-12-1-x86_64.eopkg

stephanedr commented 10 months ago

Only libxcrypt was installed. I installed libxcrypt-compat -> reboot fails.

I downgraded kmod (note that it has to be done after installing broadcom-sta-* otherwise kmod is re-updated). First reboot was OK but Wifi was not enabled. I ran depmod (please confirm this had to be done and is enough) then rebooted -> reboot fails.

l3nticular commented 10 months ago

I also had this problem; took me a while to figure out it was the broadcom-sta- drivers causing the issue.

l3nticular commented 10 months ago

Of note, booting to single-user mode works without crashing; the broadcom-sta- drivers don't hit the issue until about 15-20s after normal GUI boot ( sometimes I can log in before it panics).

l3nticular commented 10 months ago
System:
  Host: <> Kernel: 6.5.9-262.current arch: x86_64 bits: 64
    Desktop: Budgie v: 10.8.2 Distro: Solus 4.4 harmony
Machine:
  Type: Desktop Mobo: ASRock model: Z97E-ITX/ac serial: <superuser required>
    UEFI: American Megatrends v: P2.20 date: 03/08/2018
CPU:
  Info: quad core Intel Core i5-4690K [MCP] speed (MHz): avg: 1262
    min/max: 800/3900
Graphics:
  Device-1: NVIDIA TU106 [GeForce RTX 2060 SUPER] driver: nvidia v: 535.113.01
  Display: x11 server: X.Org v: 21.1.9 with: Xwayland v: 23.2.2 driver: X:
    loaded: nvidia gpu: nvidia,nvidia-nvswitch resolution: 1: 2560x1440 2: N/A
  API: OpenGL v: 4.6.0 vendor: nvidia v: 535.113.01 renderer: NVIDIA
    GeForce RTX 2060 SUPER/PCIe/SSE2
Network:
  Device-1: Intel Ethernet I218-V driver: e1000e
  Device-2: Broadcom BCM4352 802.11ac Wireless Network Adapter
    driver: bcma-pci-bridge
Drives:
  Local Storage: total: 14.67 TiB used: 3.39 TiB (23.1%)
Info:
  Processes: 271 Uptime: 4h 24m Memory: total: 16 GiB available: 15.58 GiB
  used: 2.79 GiB (17.9%) Shell: Bash inxi: 3.3.30
ReillyBrogan commented 10 months ago

I updated our patchset with the most recent set of Debian patches for broadcom-sta. Please revert back to the stable kernel (sudo eopkg it linux-current --reinstall), and then run the following:

sudo eopkg it https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/update-patches/broadcom-sta-current-6.30.223.271-371-1-x86_64.eopkg https://shared.getsol.us/reilly/2023-10-broadcom-boot-issue/update-patches/broadcom-sta-common-6.30.223.271-371-1-x86_64.eopkg
stephanedr commented 10 months ago

Yes it's working ! Thank you for spending time on it.

ReillyBrogan commented 10 months ago

This has been cherry-picked to stable, can someone confirm that it works now? I made a couple of additional changes over the test version, they shouldn't break it but if they do I can roll those back too.

TheSlider2 commented 10 months ago

Hey there. I had to remove the drivers to be able to boot into the desktop and use an usb wifi dongle to run these commands. I followed plutuplutu instructions back on the topic from discuss.getsol.

Reading through this issue, I only typed the last instructions you provided. It downloaded and installed the drivers but s as soon as I unplug the usb dongle the laptop doesn't fall back to the internal device. In fact, it says in the bottom that I have no network device.

Running Solus on an A1466 Macbook Air from circa 2013.

ReillyBrogan commented 10 months ago

You don't need to run any commands now, just make sure your system is fully updated. Then reboot.

TheSlider2 commented 10 months ago

Well, i'm now fully updated and after rebooting without the dongle I still have no WiFi ("no network devices available" when right clicking the network icon in the taskbar)

edit -

I tried uninstalling and reinstalling the drivers from the driver manager and rebooting but still nothing.

stephanedr commented 10 months ago

@TheSlider2 I had to run sudo depmod after the installation of the temporary patched version (yesterday).

@ReillyBrogan I've just installed the "official" updates and this is still OK on my side.

TheSlider2 commented 10 months ago

Wow, thanks, this did the trick. Thank you.