Dunedan / mbp-2016-linux

State of Linux on the MacBook Pro 2016 & 2017
2.1k stars 108 forks source link

Can't load amdgpu on linux kernel 5.7.14+ #159

Open cristianmiranda opened 4 years ago

cristianmiranda commented 4 years ago

I'm on a MacBook Pro 13,3 and currently running 5.0.0-32-generic wich allows me to load amdgpu and turn it off doing the following:

gpu-manager | grep 'amdgpu loaded? no' && sudo modprobe amdgpu || echo 'AMD GPU already loaded'
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

I decided to give a more recent kernel a try, so I tested 5.8 and 5.7.14. When I run sudo modprobe amdgpu the macbook freezes (I can't do anything but turn it off and on again). I understand that vgaswitcheroo is not present anymore on kernels version 5.4+. Is that right?.

After installing the kernel I got the following:

DKMS: install completed.
   ...done.
Setting up linux-modules-5.8.0-050800-generic (5.8.0-050800.202008022230) ...
Setting up linux-image-unsigned-5.8.0-050800-generic (5.8.0-050800.202008022230) ...
I: /vmlinuz.old is now a symlink to boot/vmlinuz-5.0.0-32-generic
I: /vmlinuz is now a symlink to boot/vmlinuz-5.8.0-050800-generic
I: /initrd.img is now a symlink to boot/initrd.img-5.8.0-050800-generic
Processing triggers for linux-image-unsigned-5.8.0-050800-generic (5.8.0-050800.202008022230) ...
/etc/kernel/postinst.d/dkms:
 * dkms: running auto installation service for kernel 5.8.0-050800-generic
   ...done.
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-5.8.0-050800-generic
W: Possible missing firmware /lib/firmware/amdgpu/navi12_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/raven_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/raven2_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/picasso_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_asd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_sos.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_asd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_sos.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_asd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_sos.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_asd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_sos.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega20_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_asd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_me.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_pfp.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_ce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/raven_kicker_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_me.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_pfp.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_ce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_me.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_pfp.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_ce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_mec2_wks.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_mec_wks.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_me_wks.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_pfp_wks.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_ce_wks.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_me.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_pfp.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_ce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_sdma1.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_sdma1.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_sdma1.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_mes.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_vcn.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_vcn.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_vcn.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_vcn.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_vcn.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_smc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_smc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_smc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_smc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_dmcu.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_dmcub.bin for module amdgpu
W: Possible missing firmware /lib/firmware/i915/skl_huc_2.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/skl_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_huc_2.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_huc_4.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/glk_huc_4.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/glk_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_huc_4.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/cml_huc_4.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/cml_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/icl_huc_9.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/icl_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/ehl_huc_9.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/ehl_guc_33.0.4.bin for module i915
W: Possible missing firmware /lib/firmware/i915/tgl_huc_7.0.12.bin for module i915
W: Possible missing firmware /lib/firmware/i915/tgl_guc_35.2.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/icl_dmc_ver1_09.bin for module i915
W: Possible missing firmware /lib/firmware/i915/tgl_dmc_ver2_06.bin for module i915
I: The initramfs will attempt to resume from /dev/nvme0n1p3
I: (UUID=d6cab994-b144-43a3-a412-100ce59aa599)
I: Set the RESUME variable to override this.

I'm available to test stuff if you have any ideas. I also don't mind staying on 5.0.0-32-generic (and even more if on newer kernels I don't have the chance to load/turn off the dedicated GPU).

Thanks!

andyholst commented 4 years ago

I have never tried the intel graphics. I am running the default EFI MBP133 firmware, I have to test out the intel graphics first by turning off the amdgpu.

andyholst commented 4 years ago

@cristianmiranda have you tried to apply the patch https://marc.info/?l=grub-devel&m=141586614924917&q=p3 for eboot.c file? I believe patching the kernel to make it recognize the intel graphics card for MBP133 is better way to go then hacking the grub boot loader.

However, since kernel version 5.7+ the eboot.h and eboot.c has been moved from arch/x86/boot/compressed to drivers/firmware/efi/libstub as efistub.h and x86-stub.c

Want to make an bug report at https://bugzilla.kernel.org/ regarding this issue or do you want to try out the kernel patch instead for kernel v5.7+ before bug report?

andyholst commented 4 years ago

@cristianmiranda did you get any progress with amdgpu?

I am currently running Linux kernel version 5.8.3 and following command lspci -nnk | grep -i vga -A3 | grep 'in use' gives me Kernel driver in use: amdgpu for MBP 13,3.

cristianmiranda commented 4 years ago

Hi @andyholst, sorry, I didn't have time to reply before, and then I just forgot.

I couldn't make much progress on this. My main concern is not being able to turn off the AMD GPU since resource consumption is higher than just using the integrated GPU. In order to do that I need vgaswitcheroo and I don't see it (maybe this is related to https://github.com/Dunedan/mbp-2016-linux/issues/6#issuecomment-621350141).

I'm currently on 5.0.0-32-generic but I have 5.7 installed as well for running tests if you want.

Thank you so much for your interest on this!

bwt, this is what I get on 5.0.0-32-generic:

❯ lspci -nnk | grep -i vga -A3 | grep 'in use'
    Kernel driver in use: i915
    Kernel driver in use: amdgpu
andyholst commented 4 years ago

@cristianmiranda right, the reason why vgaswitcheroo is not showing up is because the kernel don't recognize 2 gpus which is contradiction to the result of your "lspci -nnk | grep -i vga -A3 | grep 'in use'" command.

I still think trying out the kernel patch should be better then doing the grub boot hack so the gpu detection is integrated instead during boot. They have done major refactoring to the kernel structure, so can be bit tricky to apply the patch, I think it is still worth trying out.

cristianmiranda commented 4 years ago

@andyholst that makes a lot of sense. It's weird because I'm using rEFInd in order to spoof macOS when loading 5.0.0-32-generic, so I didn't have to patch that one. I'd expect to see the integrated GPU in any other kernel version loaded with refind. Anyway, I'm going to do some research on how to patch the kernel (I have no idea where to start) and will let you know how it goes. Any suggestions I should consider before doing this?. Thanks!

andyholst commented 4 years ago

@cristianmiranda patching the kernel is matter of testing by applying it in the old structure for 5.5+ (arch/x86/boot/compressed/eboot.c) by executing the command patch -p1 < ../patch-x.y.z at the root directory for the Linux repo v5.5 branch. The patch-x.y.z is the file you get from https://marc.info/?l=grub-devel&m=141586614924917&q=p3 , If you get it to work, then try it for 5.7+ with the new structure drivers/firmware/efi/libstub where the efistub.h and x86-stub.c files are located, and you should apply it to x86-stub.c instead. I would diff the patched the eboot.c file against x86-stub.c to see how the structure differs and apply it manually then create a new patch and report to the kernel bug report section if it has worked before.

andyholst commented 4 years ago

@cristianmiranda I have checked out v5.5 tag release and I have applied the patch from https://marc.info/?l=grub-devel&m=141586614924917&q=p3 and I had to resolve some merge conflicts, you can see the commit at my linux-stable-fork patch branch https://github.com/andyholst/linux-stable-fork/tree/efi-Identify-as-OS-X-to-EFI-drivers-before-booting

Going to test it out during the week.

andyholst commented 4 years ago

@cristianmiranda the patch for v5.5 works for me, I can access the /sys/kernel/debug/vgaswitcheroo/switch file and it gives me the following

0:DIS:+:Pwr:0000:01:00.0 1:IGD: :Pwr:0000:00:02.0

The command sudo modprobe amdgpuworks just fine without any freezes.

The command lspci | grep "VGA"gives me the list:

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06) 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev c0)

The command gpu-manager verifies that the intel card and amd card is loaded with following list:

last_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot new_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot can't access /run/u-d-c-nvidia-was-loaded file can't access /opt/amdgpu-pro/bin/amdgpu-pro-px Looking for nvidia modules in /lib/modules/5.5.0+/updates/dkms Looking for amdgpu modules in /lib/modules/5.5.0+/updates/dkms Is nvidia loaded? no Was nvidia unloaded? no Is nvidia blacklisted? no Is intel loaded? yes Is radeon loaded? no Is radeon blacklisted? no Is amdgpu loaded? yes Is amdgpu blacklisted? no Is amdgpu versioned? no Is amdgpu pro stack? no Is nouveau loaded? no Is nouveau blacklisted? no Is nvidia kernel module available? no Is amdgpu kernel module available? no Vendor/Device Id: 8086:191b BusID "PCI:0@0:2:0" Is boot vga? no Vendor/Device Id: 1002:67ef BusID "PCI:1@0:0:0" Is boot vga? yes Skipping "/dev/dri/card1", driven by "i915" Found "/dev/dri/card0", driven by "amdgpu" output 0: card0-eDP-1 output 1: card0-DP-1 Number of connected outputs for /dev/dri/card0: 2 Skipping "/dev/dri/card1", driven by "i915" Skipping "/dev/dri/card0", driven by "amdgpu" Skipping "/dev/dri/card1", driven by "i915" Skipping "/dev/dri/card0", driven by "amdgpu" Found "/dev/dri/card1", driven by "i915" Number of connected outputs for /dev/dri/card1: 0 Does it require offloading? no last cards number = 2 Has amd? yes Has intel? yes Has nvidia? no How many cards? 2 Has the system changed? No Unsupported discrete card vendor: 8086 Nothing to do

So if you study the new efi boot structure for v5.7+ you should be able to apply the patch for it as well or even make a serious patch for upstream if you are up to it.

cristianmiranda commented 4 years ago

@andyholst thank you so much for spending time on this. I'll give it a try. I'm going to close this issue as you already proved that patching the kernel makes this is possible.

andyholst commented 4 years ago

@cristianmiranda actually, I wouldn't close it until it has been fully verified, since you most likely want it to be included in the upstream as long the patch is 'good enough' and to test it on the new efi boot structure for v5.7+, I have no idea if a kind of this patch has been merged to upstream before, but worth spending time to verify if it has been applied before. If it's a bug and therefore has worked before in earlier kernel versions, then you don't close the issue until the bug has been patched.

Dunedan commented 3 years ago

Any updates regarding this issue? What's the status with recent kernel versions? Did it get fixed upstream?

cristianmiranda commented 3 years ago

@Dunedan I believe @andyholst played around with a patched version of 5.5. I haven't spent time on this issue. Sorry.

andyholst commented 3 years ago

@Dunedan I tried out the patch for kernel version 5.8, it can't be applied since they have refactored the design, I relate to an old email convrsation...

Hi gentlemen, Thank you so much for your code contribution to the Linux kernel. Is there any news on the apple gmux dual gpu support for newer MBP models (late 2016/2017) without having to deal with the OS X version/vendor efi boot hack the way you apply it to the old efi boot code structure (<= v5.5) https://github.com/andyholst/linux-stable-fork/commit/90c1102af4324dc05271e4d1bb49badfe3a7e7cf ? Keep up the good work! Regards Andy Holst

The i915 developers said a while ago that they'll look into turning on the GPU if EFI has disabled it. I suppose they haven't made progress but you may want to prod them on their mailing list: intel-gfx@lists.freedesktop.org The above-linked patch wasn't upstreamed so far because on MacBook Airs, it has the side effect that the keyboard/trackpad is switched to SPI if it's accessible both via SPI and USB. We do have a driver in mainline now for the SPI keyboard, but it may not work on the MBA yet: https://github.com/cb22/macbook12-spi-driver/issues/65 Thanks, Lukas

I couldn't try the 'v5.5' OS X version/vendor efi boot hack kernel patch on the macbook air models that has issues with the SPI conflicts by applying this old patch https://github.com/andyholst/linux-stable-fork/commit/90c1102af4324dc05271e4d1bb49badfe3a7e7cf

@Dunedan @cristianmiranda This issue is mainly related to i915 developers, dunno if it is worth mentioning that the OS X version/vendor efi boot kernel hack is not working for >= v5.7, dunno about the grub boot loader hack though, that is, be able to switch between the amd/intel graphics on MBP 13,3 model.

I will not look into this issue.