Open BernardoGO opened 9 years ago
Was it the black flickering in full screen Windows? I had it fixed by using Wayland or KDE.
It used to happen mostly when things are changed in the screen On May 5, 2016 2:16 PM, "Jacob Mischka" notifications@github.com wrote:
4.6 fixed an unrelated issue I was having with a flickering/hanging display: https://bugs.freedesktop.org/show_bug.cgi?id=94161.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/Bumblebee-Project/bbswitch/issues/115#issuecomment-217249880
No, it was over aggressive power saving that would make the skylake integrated graphics just stop working for a fraction of a second at random times.
Do you have this black flickering as well?
I did until 4.6, yes.
Edit: I mean I had the kind I mentioned. I don't have the kind you mentioned as far as I know.
I'm trying it right now. Just installed it. Version: 4.6.0-1-ARCH I can see that indeed the problem is not fixed also for Broadwell. But it seems like my flickering in Gnome is not happening anymore. I have to try it for a little longer before confirming it.
Have you tried it with Nouveau? I saw somewhere that the kernel 4.6 allied to the new nouveau will solve the problem related to the suspension.
I haven't tried nouveau, isn't the performance still terrible?
Wait, where did you find that kernel? 4.6.0-1-ARCH?
I’m not sure what linux-mainline from AUR output as kernel version, but I think it’s that. ;)
I'm using mainline and it didn't report that, so that's why I was asking if he found something better. Anyway, no big deal.
@jacobmischka It is not the mainline. I'm using the linux-git from AUR. I don't know why is it not reporting it as RC version. It is the 4.6rc6
@hundredyearslate Fixed what?
@jacobmischka I think that nouveau will probably perform better now since nvidia is providing the firmware for it. They have released it in February and it was supposed to be available for us after the 4.6 release.
Are you guys also suffering some graphic glitches after installing the optimus setup using the bbswitch? Specially in Cinnamon, I'm having many rendering problems.
I have resumed debugging yesterday and obtained some traces from Windows 10. Files that can be analyzed are available at https://lekensteyn.nl/files/p651ra-acpi-debug/
tr -d '\r' <kd.log | grep -vE 'ignore|[Aa]ssertion failure|^being terminated|^If you want to force|^$' > kd-filtered.log
!amli set traceon
enabled.For an acpidump see https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_P651RA
I have not fully analyzed it yet, but after a quick look it seems that Windows 10 first calls _PS3
/_PS0
and only then it calls _DSM
while bbswitch and nouveau do the opposite. To be continued...
@hundredyearslate It is probably not needed, I already got an interesting observation that matches what others have reported before (in #112, https://lkml.org/lkml/2016/3/9/65 and many other places. (nice name btw, hopefully it did not refer to my comment delays :p ;) ).
(If you can easily retrieve it, then maybe having some comparison material would be nice; use !amli set traceon spewon verboseon
once you have the kernel debugger attached. This requires a checked (=with debugging symbols) Windows build though, I was able to get one through my university's study association)
So the main problem is that bbswitch/nouveau still uses DSM calls which are unusable/untested for newer devices. Apparently you have to disable the parent device (likely the PCIe port) to put the Nvidia card in D3cold state. Information about this state can be found in the ACPI 6.1 specification, section 7.3.11 _PR3
(Power Resources for D3hot). Paraphrased/my interpretation: if there is a _PR3
for a device, the OS can turn off the power resources after executing _PS3
(by calling the _OFF
method of those power resources, this will enter D3(cold) state).
(according to table 7-224 on page 401, D3cold is supported by providing _PR3
)
My laptop for example has a \_SB.PCI0.PEG0._PR3
object evalulating to PG00
(\_SB.PCI0.PEG0.PG00
). Thus I need to call \_SB.PCI0.PEG0.PG00._OFF
on this device after calling \_SB.PCI0.PEG0.PEGP._PS3
. (I have not found the ACPI spec line that says that _PR3
should be looked up in parent devices (PEGP) though)
From amli2.log I can see this sequence:
AMLI: ffffe001e8ec7040: AsyncEvalObject(\_SB.PCI0.PEG0.PEGP._PS3)
AMLI: FFFFE001E8EC7040: \_SB.PCI0.PEG0.PEGP._PS3()
ffffe001ef96f002: {
ffffe001ef96f002: If(LEqual(OPCE=0x2,0x3)=0x0)
ffffe001ef96f024: Store(0x3,_PSC)=0x3
ffffe001ef96f02b: }
AMLI: ffffe001e8ff3040: AsyncEvalObject(\_SB.PCI0.PEG0.PG00._OFF)
OPCE
is initialized with 2 and is only possibly changed to 3 via the Optimus DSM method (which is apparently deprecated/not called in Windows 10). The corresponding _PS3
method is:
Method (_PS3, 0, NotSerialized) {
If ((OPCE == 0x03)) { // <-- false (0x2 != 0x3)
If ((DGPS == Zero)) {
_OFF ()
DGPS = One
}
OPCE = 0x02
}
_PSC = 0x03 // <-- executed
}
So it appears that Windows immediately turns off the power resource of the parent PCIe port after calling _PS3
.
For powering on the graphics card, I see the following sequence:
AMLI: ffffe000c5ac7040: AsyncEvalObject(\_SB.PCI0.PEG0.PEGP._PS0)
AMLI: ffffe000ce289040: AsyncEvalObject(\_SB.PCI0.PEG0.PG00._ON)
AMLI: ffffe000c5ac7040: EvalNameSpaceObject(\_SB.PCI0.PEG0.PEGP._DSM)
String(:Str="------- GPS DSM --------")
String(:Str="GPS fun 2a")
AMLI: ffffe000c5ac7040: EvalNameSpaceObject(\_SB.PCI0.PEG0.PEGP._DSM)
AMLI: ffffe000c5ac7040: EvalNameSpaceObject(\_SB.PCI0.PEG0.PEGP._DSM)
AMLI: ffffe000c5ac7040: EvalNameSpaceObject(\_SB.PCI0.PEG0.PEGP._DSM)
String(:Str="------- GPS DSM --------")
String(:Str="GPS fun 19")
With the DSM parameters (in ssdt7.dsl for Clevo P651RA) being:
// calls "GPS DSM" and does some magic
AMLI: FFFFE001E8EC5040: \_SB.PCI0.PEG0.PEGP._DSM(Buffer(0x10){
0x01,0x2d,0x13,0xa3,0xda,0x8c,0xba,0x49,0xa5,0x2e,0xbc,0x9d,0x46,0xdf
0x6b,0x81},0x100,0x2a,Buffer(0x4){
0x02,0x03,0x00,0x00})
// func 0x05, does more magic
AMLI: FFFFE001E8EC5040: \_SB.PCI0.PEG0.PEGP._DSM(Buffer(0x10){
0xf8,0xd8,0x86,0xa4,0xda,0x0b,0x1b,0x47,0xa7,0x2b,0x60,0x42,0xa6,0xb5
0xbe,0xe0},0x100,0x5,Buffer(0x4){
0x00,0x00,0x00,0x00})
// func 0x1B, smaller magic
AMLI: FFFFE001E8EC5040: \_SB.PCI0.PEG0.PEGP._DSM(Buffer(0x10){
0xf8,0xd8,0x86,0xa4,0xda,0x0b,0x1b,0x47,0xa7,0x2b,0x60,0x42,0xa6,0xb5
0xbe,0xe0},0x100,0x1b,Buffer(0x4){
0x00,0x00,0x00,0x00})
// smaller magic
AMLI: FFFFE001E8EC5040: \_SB.PCI0.PEG0.PEGP._DSM(Buffer(0x10){
0x01,0x2d,0x13,0xa3,0xda,0x8c,0xba,0x49,0xa5,0x2e,0xbc,0x9d,0x46,0xdf
0x6b,0x81},0x100,0x13,Buffer(0x4){
0x04,0x00,0x00,0x00})
I wonder what those DSM methods are used for and whether these also occur on other laptop models.
Have you guys tried to change the acpi_osi to report an older version of windows? It works for me
Setting acpi_osi="!Windows 2015"
might work for older devices (that should work with Windows 7 or something), but for newer devices it will be increasingly more probable to be non-working (because Windows 10 uses the new interface and vendors are likely cheap and do not validate for older OSes).
I'm sorry, I'm a bit lost with all of the recent talk in this project's issues and updates in bumblebee.
Should I be running the develop branch of bumblebee, is running a version including https://github.com/Bumblebee-Project/Bumblebee/pull/762 better than running the current stable build and blacklisting nvidia, nvidia-drm, nvidia-modeset, and nvidia-uvm in a modprobe conf file? Will any of these things make any difference to bbswitch?
Is there anything I should be doing differently, or is your suggestion essentially to just stick to rmmodding bbswitch before suspending until the PCIe changes come in 4.7, or until the DSM calls are straightened out?
Should I try using the disable_root_port fork mentioned in #112? Is that the same thing as pcie-root-port which is mentioned later? Is there a specific acpi_osi setting I should be using? I'm confused about which suggestions in #112 I should be considering, because although the thread initially wasn't about suspend, it's mentioned in there several times.
I apologize for all the questions, but there are so many things being mentioned across various issues that I don't really know what I'm supposed to be doing. Thank you for your help.
@jacobmischka The develop branch of Bumblebee is currently recommended over the master branch for compatibility with newer nvidia driver versions.
The hack from #112 should not be needed with Linux 4.7 and an appropriate version of bbswitch (not released yet). While it is not a problem with overheating during suspend, it is related to the fact that newer machines expect a different interface to be used (power resources _ON
/_OFF
instead of _DSM
). About the disable_root_port
fork, the root port is already controlled by the pcieport driver, I am not sure if it is a good idea to manage it in bbswitch too... that sounds risky (race conditions).
Could you open a new issue for your Acer E5-574G and include include your BIOS version and the output of sudo acpidump > acpidump.txt
? Edit: I checked the ACPI table from BIOS 1.14 for your model and found that your model indeed expects control of power resources.
Done, I don't remember if referencing issues results in a notification or not. Thanks!
:) I'm just passing by
Still no way to fix this ? :(
@Anti-Ultimate Not fix available in a stable version of bbswitch or the kernel. If you do not mind using the mainline kernel, try Linux 4.8-rc1 or newer with the nouveau module (and not bbswitch).
@Lekensteyn does nouveau support Optimus without bbswitch?
@BernardoGO If you only need to save power, then both nouveau and bbswitch are functional. If you need to connect an external monitor, then dump bbswitch for nouveau. If you need to use the blob, then you can also try to use nouveau, but you would have to manually unload nouveau before loading nvidia.
I may be late discovering this, but it seems like upgrading my laptop (Clevo P651SE/XMG P505, GTX 970m) from Debian jessie to stretch (Linux 4.9, bbswitch 0.8, nvidia driver 375.82) fixed this issue. At least, nvidia-smi did not report any temperature difference after having the laptop suspended for 5-10 minutes. :)
@firetech Does bbswitch actually work? Can you see that in dmesg? If runtime PM is not enabled or if bbswitch is not activated, then the problem is not triggered.
@Lekensteyn I'm quite sure bbswitch works, but I haven't double checked. My laptop has a status LED for the dGPU and it is off unless I start an application with optirun/primusrun. Also, the LED comes on just before suspend (but obviously turned off while in suspend), just like in Windows. I'll double check the dmesg later, my laptop isn't with me at the moment.
EDIT: bbswitch is definitely working:
[ 5.916385] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on [ 5.917197] bbswitch: disabling discrete graphics
Me running optirun nvidia-smi
:
[82398.146560] bbswitch: enabling discrete graphics [82400.369356] bbswitch: disabling discrete graphics
@Lekensteyn Also, at least on my laptop, I got the overheating also when I tried an Ubuntu live USB (when it was new, so late 2014) without anything optimus, nvidia och bbswitch related installed. The dGPU was left on without anything touching it. Since then, I haven't touched the BIOS/UEFI settings.
I have a Clevo P650SE-A(Sager NP8651) with NVidia GeForce GTX970m and Intel HD5600, the laptop does have a LED that shows if the discrete GPU is being used or not. I got bumblebee successfully working on both Ubuntu 14 and 15, I have noticed that while turning on, off or going to sleep the laptop turns on the dGPU LED. The problem is that it leaves the GeForce on while sleeping. Which leads to overheat with the fans off and a waste of 15-20% per hour while sleeping. It is actually heating and using more battery while off than it does on.
Hibernating is not an option because it cannot resume after hibernation(black screen with cursor) and I don't really want it to hibernate since I'm used to suspend it many times per day.
This happens with Ubuntu 14 and 15. I'm using the 3.19 kernel because the 4.2 does not seems to support my video card. The sleep problem does not happens on Windows using optimus.
I'm not using UEFI, does it have something to do with it?
Ubuntu 14.10
https://bugs.launchpad.net/debian/+bug/752542/+attachment/4504886/+files/CLEVO-P65xSE-A.tar.gz