NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.06k stars 14.11k forks source link

[24.05] zfs.latestCompatibleLinuxPackages selects linux-libre; unable to load proprietary firmware #341867

Closed happyalu closed 1 month ago

happyalu commented 1 month ago

Describe the bug

I'm using flakes to configure my nixos (24.05) hosts. Updating from commit f1bad50880bae73ff2d82fafc22010b4fc097a9c to e65aa8301ba4f0ab8cb98f944c14aa9da07394f8 caused me to boot with bad display.

It seems to be related to kernel change 6.6 -> 6.10: I see this in the boot log.

Sep 14 20:53:34.224280 host1 kernel: [drm] amdgpu kernel modesetting enabled.
Sep 14 20:53:34.231271 host1 kernel: amdgpu: Virtual CRAT table created for CPU
Sep 14 20:53:34.231310 host1 kernel: amdgpu: Topology: Add CPU node
Sep 14 20:53:34.231386 host1 kernel: amdgpu 0000:07:00.0: enabling device (0006 -> 0007)
Sep 14 20:53:34.231620 host1 kernel: [drm] initializing kernel modesetting (RENOIR 0x1002:0x1638 0x1043:0x8809 0xC8).
Sep 14 20:53:34.231650 host1 kernel: [drm] register mmio base: 0xFCB00000
Sep 14 20:53:34.231674 host1 kernel: [drm] register mmio size: 524288
Sep 14 20:53:34.234376 host1 kernel: [drm] add ip block number 0 <soc15_common>
Sep 14 20:53:34.234407 host1 kernel: [drm] add ip block number 1 <gmc_v9_0>
Sep 14 20:53:34.234442 host1 kernel: [drm] add ip block number 2 <vega10_ih>
Sep 14 20:53:34.234461 host1 kernel: [drm] add ip block number 3 <psp>
Sep 14 20:53:34.234479 host1 kernel: [drm] add ip block number 4 <smu>
Sep 14 20:53:34.234499 host1 kernel: [drm] add ip block number 5 <dm>
Sep 14 20:53:34.234542 host1 kernel: [drm] add ip block number 6 <gfx_v9_0>
Sep 14 20:53:34.234561 host1 kernel: [drm] add ip block number 7 <sdma_v4_0>
Sep 14 20:53:34.234580 host1 kernel: [drm] add ip block number 8 <vcn_v2_0>
Sep 14 20:53:34.234599 host1 kernel: [drm] add ip block number 9 <jpeg_v2_0>
Sep 14 20:53:34.234624 host1 kernel: amdgpu 0000:07:00.0: amdgpu: Fetched VBIOS from VFCT
Sep 14 20:53:34.234825 host1 kernel: amdgpu: ATOM BIOS: 113-CEZANNE-018
Sep 14 20:53:34.234848 host1 kernel: 0000:07:00.0: Missing Free firmware (non-Free firmware loading is disabled)
Sep 14 20:53:34.234867 host1 kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <psp> failed -19
Sep 14 20:53:34.235271 host1 kernel: 0000:07:00.0: Missing Free firmware (non-Free firmware loading is disabled)
Sep 14 20:53:34.236270 host1 kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <dm> failed -19
Sep 14 20:53:34.236316 host1 kernel: 0000:07:00.0: Missing Free firmware (non-Free firmware loading is disabled)
Sep 14 20:53:34.237267 host1 kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <gfx_v9_0> failed -19
Sep 14 20:53:34.237337 host1 kernel: 0000:07:00.0: Missing Free firmware (non-Free firmware loading is disabled)
Sep 14 20:53:34.238298 host1 kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <sdma_v4_0> failed -19
Sep 14 20:53:34.238323 host1 kernel: [drm] VCN decode is enabled in VM mode
Sep 14 20:53:34.238349 host1 kernel: [drm] VCN encode is enabled in VM mode
Sep 14 20:53:34.238371 host1 kernel: 0000:07:00.0: Missing Free firmware (non-Free firmware loading is disabled)
Sep 14 20:53:34.239268 host1 kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <vcn_v2_0> failed -19
Sep 14 20:53:34.239301 host1 kernel: [drm] JPEG decode is enabled in VM mode
Sep 14 20:53:34.239326 host1 kernel: amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init
Sep 14 20:53:34.239693 host1 kernel: amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device.

I have reverted this back, but I'm not sure if I need to change anything in my host config to fix this, or just wait for a kernel update.

starkca90 commented 1 month ago

I could be misremembering, but I vaguely recall some directory change back in 6.9 or something and I got similar errors.

Root of my problem was my boot.kernelPackages was using one repo and some other kernel line or module or something was using a more up to date repo (was working around some other out dated package or something).

I ended up switching the entry that was using my "non-standard" package repo back to NixOS and was back in business.

Atemu commented 1 month ago

The default kernel was not updated to 6.10, this must be caused by your config. If AMDGPU is broken, there's not much we can do about that but wait for the kernel devs to fix it.

happyalu commented 1 month ago

Thanks.

I had this in the config which was causing trouble.

boot.kernelPackages = config.boot.zfs.package.latestCompatibleLinuxPackages;
nixos-discourse commented 1 month ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/non-free-firmware-not-loaded-after-update/52183/1

coreyoconnor commented 1 month ago

This applies to more than just amdgpu. This also occurred on my laptop with Intel. Specifically the wifi module also could not locate the non-free firmware.

coreyoconnor commented 1 month ago

EG: the i915 module also reports the same "non-Free firmware loading is disabled"

coreyoconnor commented 1 month ago

Indeed. Looks like even if enableAllFirmware and enableRedistributaleFirmware are true the 6.10 kernel does not load any non-free firmware.

Not an amdgpu specific bug then

coreyoconnor commented 1 month ago

Cool! Using the default linux kernel, 6.6.51, worked as expected.

That tells me this is specific to 6.10 and not some general breakage with nixpkgs.

I did look through the 6.9 and 6.10 changelogs and didn't see anything about firmware loading changes. But I didn't read through 6.8 and 6.7.

jeff84 commented 1 month ago

Kernel 6.8.12 has been working without a problem before 6.8 EOL. The 6.10 kernel was the first with -gnu suffix

Linux zellat2nix 6.10.10-gnu #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan  1 00:00:00 UTC 1980 x86_64 GNU/Linux

Could it be a new patchset which is used in 6.10 kernel?

Atemu commented 1 month ago

@coreyoconnor is that on 24.05 or unstable?

happyalu commented 1 month ago

I had this on 24.05 with boot.kernelPackages = config.boot.zfs.package.latestCompatibleLinuxPackages;

coreyoconnor commented 1 month ago

@coreyoconnor is that on 24.05 or unstable?

This is on 24.05. I can test on unstable if that is useful.

jeff84 commented 1 month ago

I think it could be a problem with boot.kernelPackages = config.boot.zfs.package.latestCompatibleLinuxPackages; For me it seems like this option picks the pkgs.linuxPackages-libre kernel package.

If I select boot.kernelPackages = pkgs.linuxPackages_6_10; it's working normally as expected.

~ % uname -a
Linux zellat2nix 6.10.10 #1-NixOS SMP PREEMPT_DYNAMIC Thu Sep 12 09:13:13 UTC 2024 x86_64 GNU/Linux
~ % lsmod| grep iwlwifi
iwlwifi               561152  1 iwlmvm
cfg80211             1347584  3 iwlmvm,iwlwifi,mac80211
firmware_class         57344  19 btrtl,snd_soc_avs,snd_hda_intel,intel_ipu6,xhci_pci_renesas,btmtk,snd_sof,drm_display_helper,intel_ipu6_isys,btintel,snd_soc_hdac_hda,btbcm,iwlwifi,btusb,mei_vsc_hw,xe,i915,cfg80211,intel_ishtp
coreyoconnor commented 1 month ago

Oh interesting! I'll try the same. I did attempt to look through that selection code but got lost XD

On Mon, Sep 16, 2024, 11:22 jeff84 @.***> wrote:

I think it could be a problem with boot.kernelPackages = config.boot.zfs.package.latestCompatibleLinuxPackages; For me it seems like this option picks the pkgs.linuxPackages-libre kernel package.

If I select boot.kernelPackages = pkgs.linuxPackages_6_10; it's working normally as expected.

~ % uname -a Linux zellat2nix 6.10.10 #1-NixOS SMP PREEMPT_DYNAMIC Thu Sep 12 09:13:13 UTC 2024 x86_64 GNU/Linux ~ % lsmod| grep iwlwifi iwlwifi 561152 1 iwlmvm cfg80211 1347584 3 iwlmvm,iwlwifi,mac80211 firmware_class 57344 19 btrtl,snd_soc_avs,snd_hda_intel,intel_ipu6,xhci_pci_renesas,btmtk,snd_sof,drm_display_helper,intel_ipu6_isys,btintel,snd_soc_hdac_hda,btbcm,iwlwifi,btusb,mei_vsc_hw,xe,i915,cfg80211,intel_ishtp

— Reply to this email directly, view it on GitHub https://github.com/NixOS/nixpkgs/issues/341867#issuecomment-2353605065, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAIMDPCPSPOC5AWJP26I5DZW4OV3AVCNFSM6AAAAABOG6TT4OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJTGYYDKMBWGU . You are receiving this because you were mentioned.Message ID: @.***>

terrorbyte commented 1 month ago

Also happened to me with with https://releases.nixos.org/nixos/24.05/nixos-24.05.4974.8f7492cce289/nixexprs.tar.xz and with the boot.kernelPackages = config.boot.zfs.package.latestCompatibleLinuxPackages;

jeff84s comment of manually selecting the package version appears to be a temporary fix: https://github.com/NixOS/nixpkgs/issues/341867#issuecomment-2353605065

nixos-discourse commented 1 month ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/non-free-firmware-not-loaded-after-update/52183/3

flintflump commented 1 month ago

As suggested in https://github.com/NixOS/nixpkgs/issues/341867#issuecomment-2353605065, it seems that the new logic for zfs' latestCompatibleLinuxKernel (introduced in https://github.com/NixOS/nixpkgs/commit/27b52adcb4c933be136b802af32765ca0dc4b75d) chooses a libre Kernel:

Working commit (f4c846a)

nix-repl> pkgs.zfs_2_2.passthru.latestCompatibleLinuxPackages.kernel.isLibre
false

nix-repl> pkgs.zfs_2_2.passthru.latestCompatibleLinuxPackages.kernel.name
"linux-6.6.50"

Non-working commit (8f7492c):

nix-repl> pkgs.zfs_2_2.passthru.latestCompatibleLinuxPackages.kernel.isLibre
true

nix-repl> pkgs.zfs_2_2.passthru.latestCompatibleLinuxPackages.kernel.name
"linux-6.10.10"

It seems https://github.com/NixOS/nixpkgs/commit/34e1748391b028788b14a30740309e1739293c77 might be the culprit.

Atemu commented 1 month ago

The fix should be coming to a 24.05 channel near you in the coming days.

nixos-discourse commented 1 month ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/no-more-sound-on-laptop-after-upgrade-tuxedo-infinitybook-pro-16-gen7/52296/6