Closed DRosky closed 8 years ago
I noticed on the main page a request for some desired info that I did not provide. I am attaching that here (output of get-acpi-info.sh and the dump_info kernel module).
Hey David,
Thank you for the report and acpidump. For future reference, I think this is where BIOS can be downloaded for your laptop: https://www.asus.com/us/Notebooks/ASUS-ZenBook-Pro-UX501VW/HelpDesk_Download/
Your PGON ACPI method looks very similar to my Clevo P651RA which hangs under some circumstances. Maybe you have the same issue, can you still suspend/resume after the fan spinning loud event?
Can you try bbswitch separately from Bumblebee and not load nvidia?
Peter
Hi Peter, Thanks for the reply! A few things:
I have not tried suspend and resume after the fan starts. I will try that later and let you know the result.
-David
EDIT: Also, the problem is very consistent, is happens under all circumstances if I try to use bbswitch to turn off the card.
After reading the issue you referenced (laptop freezes), I notice that I have not tried to use bbswitch after booting with text only (no X). I will try that as well and report the result.
So far I never had a freeze, just the fan problem the fan controls (pwm) becoming locked until the machine is shut off.
Since you have tried bbswitch already which showed the same problem, could you give nouveau a go? When the issue persists, please give try this patch on top of Linux 4.6 for nouveau: https://lekensteyn.nl/files/linux-v4.6-pcipm-nouveau-pm2.patch
Some results:
It might be possible to apply patches to the OpenSUSE version, but I would suggest to use the vanilla kernel instead to exclude possible problems caused by OpenSUSE's patches.
OpenSUSE docs seems available at https://en.opensuse.org/openSUSE:Kernel_git#Building_kernel_packages
Normally you can grab a tarball or clone the repo, then apply the patches, make/copy a kernel config, build and install.
gunzip -c /proc/config.gz > .config # use old kernel configuration
make oldconfig # update the kernel configuration (just press Enter to accept new entries)
make # build modules and image
sudo make modules_install # install modules to /lib/modules/4.6.../
Then you have to install the kernel image (arch/x86/boot/bzImage
) somewhere in /boot/
and (re)create an initial ramdisk (distro-dependent stuff).
OK, thanks for the info. I'll start with the OpenSUSE page and go from there. I noticed that your patch patches a number of modules, not just Nouveau. I'll probably try it first with the OpenSUSE kernel and patches because if it works, I might end up with a fully functioning system ;)
I had a thought along a different line. Do you know if there is any tool that can capture ACPI calls in Windows 10? If so, perhaps the correct call could be captured while running a 3D application that causes the GPU to be turned on and then back off again... Just trying to think out of the box..
Capturing ACPI calls in Windows 10? Who would do such a thin... oh hey, https://github.com/Bumblebee-Project/bbswitch/issues/115#issuecomment-218551781 :smiley:
The nouveau patches combined with some PCI core patches are supposed to perform these calls. These changes will likely end up in Linux 4.8.
Capturing ACPI calls in Windows 10? Who would do such a thin... oh hey, #115 (comment) :smiley:
Haha! I finally read those comments. In addition to the kernel patch (which I will do as soon as I have a few spare hours), I was thinking I could also try tracing Windows ACPI calls on this machine if it would help. I'm not sure what tool to use, but more importantly, it seems to need a special build of Windows, which is probably a bigger issue :(
Hopefully it won't be necessary if the newer ACPI interfaces are the same across newer machines.
I used a Checked/Debug build of Win10 and a remote WinDbg/KD (kernel debugger). I don't think that an additional trace is needed, the patches I mentioned should fix the issue.
Any progress? I have the exact same problem with ASUS X550V i7 6700hq +nvidia gtx 950m I am running arch linux btw.
@verge-36 You can try nouveau with Linux 4.8-rc1 kernel (or newer). Do not use bbswitch or the nvidia blob in that case.
Hi All,
I thought I'd help out by posting a comment regarding my experience with ASUS UX501VW. I've got Ubuntu 15.10 installed and kernel version is 4.5.0 (for touchpad to work). I've got NVidia drivers installed version 352.63. Within NVidia Settings there's option called "PRIME Profiles" under this section there's 2 options to select:
I thought I would try the Intel Power Saving Mode, when I selected this option I restart my machine. While the machine was booting up when it got to the login screen the FAN came on at max speed. I tried shutting-down again had the same problem, so I logged in, went to NVidia Settings and set the PRIME Profiles back to NVIDIA (Performance Mode), saved and restarted the machine.
So I dont think bbswitch is the problem since I dont have Bumblebee installed I was thinking of installing Bumblebee thinking it would solve the problem I had but then I came across this thread.
So I think the problem here is the NVidia drivers, when the NVidia card is off it causes the fan to run at max speed.
hope this helps,
ta. muzi
Some updates:
Also, the kernel modules have changed. When nouveau is loaded, it is dependent on the following modules:
ttm mxm_wmi video i2c_algo_bit drm_kms_helper drm wmi button
One or two of these modules seem to have something to do with power management, or with new ways of interfacing with the GPU, particularly the mxm_wmi module.
Note: I haven't yet tried updating to the latest binary nvidia driver, if there is a newer one. I'll check and try that. Ideally, the Nvidia driver should shut down the GPU (or at least implement maximum power savings) when it isn't being used similar to what the nouveau driver now seems to be doing, then the extra layer of bbswitch wouldn't even be needed.
Also, I haven't tried suspend/resume yet.
@DRosky bbswitch and kernel 4.8 not working well (in particular with runtime PM enabled via laptop-mode-tools or the equivalent) is a known issue. There is no timeframe yet for a fix, the workaround is to boot with the pcie_port_pm=off
option added to your cmdline. See the dozens of other issues in the bbswitch issue tracker.
As for the hang on shutdown, see https://github.com/Bumblebee-Project/Bumblebee/issues/764#issuecomment-234494238 for a newly added workaround.
The nvidia blob will give a worse experience with your battery (it does not support the PM methods from bbswitch nor nouveau), avoid it if you can. I recommend nouveau over bbswitch, the only reason to use bbswitch is if the nouveau driver does not support your card or if you plan to use it with the binary blob.
@Lekensteyn ,
Thanks. I started catching up on the other threads (still catching up :) ) Some additional observations based on those suggestions:
pcie_port_pm=off
as without it. Either way, tee /proc/acpi/bbswitch <<<OFF
causes the full-speed fan problem.acpi_osi=! acpi_osi="Windows 2009"
. This did eliminate the hang on shutdown, but unfortunately it also caused the nouveau driver to be unable to reduce the GPU power. It seems that in order to power-manage the GPU, the nouveau driver (or one of the other modules on which it depends) needs some functionality that is not available when these kernel parameters are used.I read through the kernel bug report and verified that lspci
also hangs (when the workaround is not being used). Trying to unload the nouveau driver also does not work. Same with suspend/resume. As mentioned previously, the acpi_osi workaround seems to prevent the nouveau driver from powering down the GPU. I haven't read all of the comments yet, so I don't know if that's happening on all affected machines.
I tried one more experiment. I set pcie_port_pm=off
along with loading the nouveau driver. In this case, the nouveau driver reverted to its previous behavior shown in kernel 4.6, whereby it once again caused the full-speed fan issue.
In summary, at the moment with this machine, only the nouveau driver with PCIE port power management and no acpi_osi limitations is able to reliably power down the GPU without causing fan problems, but then it cannot be powered back on, resulting in hangs.
EDIT: I finished reading through the other threads. It appears that the problems fall into two general categories: 1) Newer laptops where bbswitch worked fine prior to kernel 4.8 but where bbswitch is now broken with 4.8, and 2) newer laptops where bbswitch already had issues (such as the fan speed issue) on older kernels, and now there are different/additional issues on 4.8. The Asus UX501 seems to be in the second category.
Machines in the first category can be helped by the pcie_port_pm=off workaround, whereas machines in the second category can't, since that just reverts to the original problems (e.g, fan speed).
Right, for your first problem (fan control), you must use the new method in 4.8 with nouveau, the old method (DSM
, forced via pcie_port_pm=off
in 4.8) will definitely not work in you case.
The second problem (hang on suspend) would occur in any case where you use bbswitch or nouveau without the acpi_osi
workaround. (Note: this problem is device-dependent, workarounds are described in https://github.com/Bumblebee-Project/Bumblebee/issues/764#issuecomment-234494238).
Yes. The only problem with the last part is that the acpi_osi workaround, at least on this machine, while preventing the hangs, causes nouveau to no longer be able to turn off the GPU, so it defeats the purpose. Hopefully, the root cause of the inability to turn the GPU back on will be found to avoid needing this workaround.
Ultimately, I suppose, bbswitch will want to incorporate the new PM method for newer laptops, otherwise it will not be possible to use the Nvidia blob with power management in bumblebee on machines like mine and others with similar issues.
As an aside, I did have a weird quirk happen this morning. The laptop mysteriously booted up with the GPU off and no nvidia driver loaded. I'm guessing that nouveau's inability to turn the GPU back on, and my subsequent needing to force the hung machine off with the power button, left the GPU in an off state that survived a restart! There are some scary things in these UEFI firmware...
Even if bbswitch adapts the new PM method, you would still need to solve the hang problem that prevents good power off/resume/etc. Hopefully some progress can be made in the PCI bug, until then you can try to override the ACPI method to remove the If (OSYS == 0x07D9)
from the PGON
method in SSDT4.
Doing that is an exercise for the reader, see https://www.kernel.org/doc/Documentation/acpi/method-customizing.txt
@Lekensteyn ,
your notes (https://github.com/Lekensteyn/acpi-stuff/blob/master/Clevo-P651RA/notes.txt#L94) show the code as If ((OSYS != 0x07DF))
(that is, !=, not ==, and 0x07DF rather than 0x07D9)? Either way, I assume the objective is to modify the code so that the Windows10-specific code segment does not get executed, correct?
@DRosky The code is model-specific, in my case there the condition OSYS != 0x07DF
, but in your case it is OSYS == 0x07D9
which is why you need a different acpi_osi
workaround. The objective is to replace this condition by If (One)
which ensures that the code is always executed.
An automated tool should be possible:
PGON
method.If ( OSYS == ... ) { // or != instead of ==
...
} Else {
LKEN (...)
}
OSYS == ...
by One
DefinitionBlock(...) {
// TODO need some External(...) references here?
Method (\_SB.PCI0...PGON) {
...
}
}
.aml
file.custom_method
module to load the new SSDT. See https://www.kernel.org/doc/Documentation/acpi/method-customizing.txt for thisHere is an example of an SSDT which I used to patch the battery method: https://github.com/Lekensteyn/acpi-stuff/blob/master/Clevo-B7130/BatteryFix.dsl
@Lekensteyn , Thank you very much for that info. An automated method would be nice since it could be easily applied to any machine. I think with that and the last link, there is enough information to try it.
I actually have also found a simpler workaround, for this machine at least. As I mentioned previously, turning off PCIE port PM caused bbswitch to revert to the old problem with the fan speed, whereas the acpi_osi
workaround prevents nouveau from power-managing the GPU. It then occurred to me that there is a possibility that telling the firmware you are "Windows 2009" might modify the behaviour of the _DSM methods in the firmware, so I added the acpi_osi
parameters in addition to pcie_port_pm=off
, and it looks like the hunch was correct, now bbswitch works well, with no fan speed problems.
I did some initial testing, and on this machine, it doesn't seem that reporting as Windows 2009 causes any hardware to become less functional, with the exception that screen brightness steps are more coarse. Bbswitch, optirun, and bumblebeed are all working well with the Nvidia binary blob.
I might still play around with modifying the PGON method, especially if the kernel PCI bug is not found soon, since that is a more forward-looking solution. Having that would pave the way for using the Nvidia blob in bumblebee once bbswitch incorporates the new PM method.
I'm going to go ahead and close this thread since those things are being tracked in more topic-appropriate threads. Thanks again for all of the help and for all the software you've created.
Ha, nice find with combining the two workarounds into a newer one that works for you :-)
Hopefully the cause of the new PM issue can be found, but at the moment it is not really going fast.
Is there a way to work around this problem with the binary nvidia driver?
@DRosky Can you show the final kernel parameter that you added? Is it something like this?
acpi_osi=! acpi_osi='Windows 2009' pcie_port_pm=off
I had to add acpi_backlight=native
to get my screen brightness keyboard working.
Hi all.
Can't solve a problem with the fan speed on the Asus n551VW Skylake 6700 gtx 960
kernel 4.8.12 Debian
The problem is the same that everyone with bbswitch >> off
the fan raises rpm to 25500. Although the Maximum rpm of 4300.
acpi_osi=! acpi_osi='Windows 2009' pcie_port_pm=off
does not help.
fancontrol leave in case of emergency.
If anyone knows how to fix it please answer!
I apologize I haven't noticed the updates to this thread until now. For me the following combination did work:
acpi_osi=! acpi_osi='Windows 2009' pcie_port_pm=off
This allowed the existing bbswitch (0.8) to disable the GPU without causing the fan problem. A caveat here is that I haven't had time to do updates for a while, so I'm still using kernel 4.8.7. If there's been a regression in this area, it might break for me when I upgrade to 2.8.12 (I've just been too busy to upgrade recently). In case that's part of the problem, you might want to try 2.8.7. If there's been a regression, it would be good to know. I'll also report any changes when I update.
As for the screen brightness keys, yes those have been broken from the beginning. I'm not in the habit of using the keyboard for that (I usually use the GUI controls which do work), but it's good to know the acpi_backlight=native
option does work.
Backlight keys could be fixed with 4.10 via https://cgit.freedesktop.org/drm-intel/commit/?h=drm-intel-next&id=8e1b56a4b1deb3d25674c49255388902901f2c45
@raidhon , I just updated my N501VW to 4.8.12 and everything is still working fine. I just noticed that your post is regarding an N551VW, which is a different model. You might want to check the link below (provided previously by Lekensteyn) to see if there is a specific work-around that is known to work with that model. If there isn't, perhaps some experimenting with the Nouveau driver might be worth trying for your machine (it didn't work for this one).
https://github.com/Bumblebee-Project/Bumblebee/issues/764#issuecomment-234494238
@DRosky thanks for the reply.
I tried all the ways from this post, is not working.
Nouveau does not suit me, as I use the GPU for simple experiments with neural networks. And for this I need the Nvidia driver.
I have everything working, only fan crazy with their noise after bbswitch >> off
.
We have to turn off and turn on the laptop ( service bumblebeed I have not in the startup) and the fan begins to operate normally.
It does not interfere with my experiments , just angry ))
If you are difficult to describe in detail what you have done what you have earned Maybe I missed something, can version of one of the packages not the same. I would be so grateful!!
@raidhon , Everything I did is captured in this thread, but in summary, the way the firmware handles some ACPI calls has changed in newer machines, which causes various problems in some cases. On some machines, disabling the newer PCIe port power management and telling the firmware you are an earlier version of windows ("Windows 2009" = Windows 7) causes the ACPI calls to be handled differently and the older method works, but this is not guaranteed and it varies from one machine to the next (for all I know, these machines might even have trouble running actual Windows 7). Until there is a solution where Linux can reliably use the new power management scheme, there will be problems on some machines.
I also use the Nvidia binary driver to access the GPU for image processing purposes, so even if the Nouveau driver could properly turn the GPU off and on without hangs, it doesn't help me that much either. Before I found this work-around, I just accepted that the GPU was powered on all the time. The battery operating time is reduced, but other than that everything worked fine.
One last thing is to make sure you have updated to the most recent UEFI firmware for your machine, in case you haven't already checked that.
hi i have exactly the same problem . ux501vw and gtx960m the fan is always on could you solve this problem?
@sohrabi924 , I haven't been following this for a while, so I don't yet know if the fundamental problem with newer UEFIs designed for Windows 10 has been solved, but for the ux501vw, the workaround shown above has worked for me to enable bumblebee without the fan problem. For reference, the workaround is:
acpi_osi=! acpi_osi='Windows 2009' pcie_port_pm=off
Note, this is with OpenSuSE Tumbleweed and has worked all the way through kernel 4.11.x so far.
Note, I haven't updated the UEFI (BIOS) since discovering this workaround, so if you have a more updated UEFI and the workaround doesn't work, then possibly there's a problem there.
-D
FYI, I don't think this is bbswitch specific problem. Although I have experienced this too on previous OS. Following arch wiki tutorial to disable dGPU completely
modprobe acpi_call
/usr/share/acpi_call/examples/turn_off_gpu.sh
gives the same max-fan result
And I can confirm that for ASUS UX550VE the combination
GRUB_CMDLINE_LINUX="acpi_osi=! acpi_osi=\"Windows 2009\" pcie_port_pm=off acpi_backlight=native"
calms down the fans on GPU disable
Hey ! For users who are trying to fix this problem on a faulty laptop, there is a temporary solution to calm down your fans.
The solution is to trick bbswitch to enable the gpu before the fans goes crazy.
To do that, ensure bbswitch is loaded (modprobe bbswitch
), then execute glxgears
on your nvidia card:
sudo optirun glxgears
update:
I can't find any clean way to add these options, and I can't find any proper grub config file in /etc/
. Any help ?
second update:
found it: /etc/default/grub
. Don't forget to backup your older command line
@jesuiscamille where did you put the optirun glxgears
? i can't find a way to make it work :man_shrugging:
thanks!
@julientaq hey there ! You may run the command in a shell/terminal.
Are you saying that running optirun glxgears
in your terminal stop the fans at any time?
As i understood it, this need to happen before X is started. Where would you add a script to make this happen?
Thanks for replying :D
@julientaq hey ! This command needs to be ran before the fan goes crazy. It needs X to be launched.
Just close your laptop. Open it, then quicly run this command. It should do the trick.
got it. Will try out! Thanks again!
Hello, I have a new Asus ux501vw (skylake version) with an Nvidia GTX960M. When bbswitch is used to turn off the GPU, after about 15-20 seconds, the cooling fans begin running at maximum speed. bbswitch reports that the gpu is off, and I believe the GPU may actually be off because the power consumption does drop somewhat, so the fans running at max speed is not a heat problem. In fact, the CPU temperature drops to below 30 deg. C because both chips are on the same heatsink. The fans can never be turned off again via any method without shutting down the system. Just rebooting does not shut them off. Here are a list of the symptoms and things that I've noticed or tried:
[ 217.578972] bbswitch: disabling discrete graphics [ 217.579000] ACPI Warning: _SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires Package (20160108/nsarguments-95) [ 217.596320] pci_raw_set_power_state: 553 callbacks suppressed [ 217.596330] pci 0000:01:00.0: Refused to change power state, currently in D0 linux-adyu:~ #
10 Attached acpidump file.. acpidump.txt
I'm not sure if this is just another manifestation of a known problem, but I don't see these symptoms reported here, so I decided to open an issue.
Regards, David