NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.45k stars 13.65k forks source link

Suspend to RAM doesn't work with NVidia Prime / Thinkpad P1 Gen2 #73494

Open mdedetrich opened 4 years ago

mdedetrich commented 4 years ago

Describe the bug On the new Lenovo Thinkpad P1 Gen 2 series (suspect its the same as X1 Extreme), when using NVidia Prime suspend to RAM no longer works. If you disable hybrid graphics in the BIOS and just use the NVidia graphics card then suspend to RAM works fine (albeit it completely kills your battery and makes the fans on your laptop go crazy).

When you do suspend to RAM in this configuration and then resume, the screen just stays black (the laptop does properly power up and the keyboard lights up). Unlike other suspend to RAM black screen problems, the only way to get out of this black screen is to physically restart the laptop with the power button, manually switching to TTY with Ctrl+Alt+F1 and doing systemctl restart display-manager does not work.

To Reproduce Steps to reproduce the behavior:

  1. Set up NVidia Prime in static mode as described here https://nixos.wiki/wiki/Nvidia
  2. Press suspend to RAM
  3. Wait for laptop to suspend to RAM and then resume by hitting the power button. The screen is now stuck being black.

Expected behavior Resuming from suspend to RAM works (i.e. it opens your login manager)

Additional context The relevant nixos configuration is here

boot.blacklistedKernelModules = [ "nouveau" ];
hardware.nvidia = {
  modesetting.enable = true;
  optimus_prime = {
    enable = true;
    nvidiaBusId = "PCI:1:0.0";
    intelBusId = "PCI:0:2.0";
  };
};
services.xserver.videoDrivers = [ "intel" "nvidiaBeta" ];

Archlinux has a good resource https://wiki.archlinux.org/index.php/Lenovo_ThinkPad_X1_Extreme_(Gen_2) . According to their documentation the latest nvidia beta drivers should just work fine (Prime Offloading should also work although that is covered by this PR https://github.com/NixOS/nixpkgs/pull/66601 )

@eadwu Noted that the most likely reason behind this is that Nixos uses default power management rather than one based on systemd. For nvidia its recommended to use the systemd power management (more info here https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/powermanagement.html)

Metadata

 - system: `"x86_64-linux"`
 - host os: `Linux 5.3.11, NixOS, 20.03pre201791.c1966522d7d (Markhor)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.1`
 - channels(root): `"nixos-20.03pre201791.c1966522d7d"`
 - channels(mdedetrich): `"nixpkgs-20.03pre201329.f1682a7f126"`
 - nixpkgs: `/home/mdedetrich/.nix-defexpr/channels/nixpkgs`

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:  [ "services.xserver.videoDrivers" "hardware.nvidia" "boot.blacklistedKernelModules" ]
eadwu commented 4 years ago

Personally for me, suspend for RAM on NVIDIA has always been "glitchy". I suppose that this issue will remain as long as we use the default power management instead of systemd-based one NVIDIA recommends.

mdedetrich commented 4 years ago

@eadwu

I suppose that this issue will remain as long as we use the default power management instead of systemd-based one NVIDIA recommends.

Can you expand on this, what power management does NixOS use? Is it possible for NixOS to use the systemd based power management currently?

Ah I suppose you mean https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/powermanagement.html, will update the issue with this as a reference

mdedetrich commented 4 years ago

Maybe it makes sense to add an hardware.nvidia.powerManagement.enable attribute along with powerManagement.systemd.enable where if you set hardware.nvidia.power_management.enable to true then it will set a powerManagement.systemd to true? Alternately we could just have it so that if powerManagement.systemd.enable = true and hardware.nvidia.enable = true then it will install the necessary systemd modules as described in the article.

Sounds like a big feature though, should another ticket be made specifically for systemd style power management?

eadwu commented 4 years ago

I don't know of any other drivers that use systemd-based power management, so if it is just a nvidia thing, keeping it under hardware.nvidia is probably the better way to do this.

Also on the expansion of the "glitchy" behavior, I don't usually have problems resuming within a short time (<1 min) of suspending with NVIDIA online, but when I leave it in suspend for long periods of time, I experience the same problem. Before this use to also effect it when the monitor brightness was 0 but this was fixed by HardDPMS and fixes in the driver I believe.

mdedetrich commented 4 years ago

I don't know of any other drivers that use systemd-based power management, so if it is just a nvidia thing, keeping it under hardware.nvidia is probably the better way to do this.

Agreed

Before this use to also effect it when the monitor brightness was 0 but this was fixed by HardDPMS and fixes in the driver I believe. I have an OLED screen so I am not sure if this is making an impact (OLED screens technically don't have brightness so the standard kernel way of configuring brightness doesn't work on them).

I also tried looking at dmesg/journalctl and I couldn't actually find any errors when trying to resume from hibernate, so the screen just being "off" could be the issue.

Also judging from what NVidia is saying about this, i.e.

However, these allocations are collectively large, and typically cannot be evicted. Since the amount of system memory available to drivers at suspend time is often insufficient to accommodate large copies of video memory, the NVIDIA kernel drivers are designed to act conservatively, and normally only save essential video memory allocations.

The resulting loss of video memory contents is partially compensated for by the user-space NVIDIA drivers, and by some applications, but can lead to failures such as rendering corruption and application crashes upon exit from power management cycles.

Another theoretical reason why the default power management doesn't work is due to limitations of memory in the kernel memory and the memory of the graphics card that I have is quite high (4 GB of VRAM). It would be good to figure out if Archlinux is using the systemd style of power management.

Before this use to also effect it when the monitor brightness was 0 but this was fixed by HardDPMS and fixes in the driver I believe.

Is HardDPMS on by default now?

Also fwiw, hibernate works flawlessly, its only suspend thats the issue.

eadwu commented 4 years ago

The limitation of memory is the most likely cause, on the Arch Linux front, seems like they use it [1], though not sure whether or not it's enabled by default. As for HardDPMS, on the latest few drivers it has been enabled by default.

Actually, judging from how Arch Linux installations work, the service are installed so they are used.

[1] https://git.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/nvidia-utils#n166

mdedetrich commented 4 years ago

Okay so it seems like this is a first good step, is this something you would be willing to look at or should I do an attempt? Note that I am very new to nix/nixos so it would likely take some time for me to learn how it works properly.

I don't think it should be that difficult since we already have a reference to work on (i.e. arch). Would be great if we could it in by the next stable (20.03)

eadwu commented 4 years ago

I'll see how much time I can pull, though yeah by the next stable the implementation should be ready if nobody else picks it up, (since at least I'll work on it over winter break).

mdedetrich commented 4 years ago

So update on this, I am no longer getting this issue from a couple of months ago although I am not sure if this is due to an update in NVIDIA/Kernel or the mix module itself

nh2 commented 4 years ago

the screen just stays black

I have this problem with my ThinkPad T25 since I upgraded from NixOS 19.09 to 20.03.

mdedetrich commented 4 years ago

@nh2

What is your config (i.e. NVidia/XServer hardware conf?)

nh2 commented 4 years ago

@mdedetrich

  services.xserver.videoDrivers = [ "nvidia" ];
  hardware.nvidia.optimus_prime.enable = true;
  # Bus ID of the NVIDIA GPU. You can find it using lspci, either under 3D or VGA
  hardware.nvidia.optimus_prime.nvidiaBusId = "PCI:2:0:0";
  # Bus ID of the Intel GPU. You can find it using lspci, either under 3D or VGA
  hardware.nvidia.optimus_prime.intelBusId = "PCI:0:2:0";
  #hardware.nvidia.modesetting.enable = true; # tried both, makes no difference

  services.xserver.displayManager.defaultSession = "xfce+i3";
niklas:~/ $ lspci | grep -i nvidia
02:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)
nh2 commented 4 years ago

This still does not work for me, even with these patches applied on top of 20.03 in order:

In both the new offload mode, and before in sync mode, do I get a black screen in Xorg when resuming from standby. I'm pretty sure the screen is fully off, that is no backlight is on (it it turns on when resuming, showing a white caret on black in the top left for 1 second, then turns fully black; it turns on when switching to the a virtual terminal, and back of when switching back to Ctrl+Alt+F7). I then have to switch to the virtual terminal with Ctrl+Alt+F1, and restart Xorg with sudo systemctl restart display-manager. Of course that loses my desktop session.

I also tried with adding "modesetting" to services.xserver.videoDrivers = [ "modesetting" "nvidia" ]; and hardware.nvidia.modesetting.enable = true. The nely added nvidia-resume.service runs thorough successfully according to journalctl output, but it doesn't help.

I'm using ligthdm with services.xserver.displayManager.defaultSession = "i3+xfce";.

@eadwu is this working for you?

(I also want to point out the "no longer works" from the issue description; this worked for me before, but broke recently, perhaps with the upgrade to 20.03.)

nh2 commented 4 years ago

I found that this helps: https://askubuntu.com/questions/512192/turn-monitor-back-on-after-xrandr/553944#553944

From the VT, run:

sudo chvt 7; sleep 3; xrandr --display :0.0 --auto

That turns the screen back on.

Isn't that what nvidia-resume.service from #73530 is supposed to do? It does a chvt, but no xrandr.


I see https://github.com/NixOS/nixpkgs/blob/9d0c3ffe6783d59b427d018e8341e0084737fde9/nixos/modules/hardware/video/nvidia.nix#L220-L224

Should something like that also be run upon resume?

nh2 commented 4 years ago

I found an ugly workaround that makes it work on my laptop:

{
  # Workaround to make standby resume work with nvidia without getting a black screen because the display is off.
  # See https://github.com/NixOS/nixpkgs/issues/73494
  systemd.services.nvidia-resume.serviceConfig = {
    # Requires `xhost +local:` in `sessionCommands` so that root can run X commands.
    ExecStartPost = "${pkgs.xorg.xrandr}/bin/xrandr --display :0.0 --auto";
  };

  services.xserver.displayManager.sessionCommands = ''
    # Needed to fix resume on nvidia, see `nvidia-resume` section.
    # TODO: This is suboptimal but I haven't figured out yet how to make root-commands work with XAUTHORITY
    ${pkgs.xlibs.xhost}/bin/xhost +local:
  '';
}
eadwu commented 4 years ago

I'm not entirely sure how the suspending works since I rarely suspend my laptop anyway since there seems to be bugs with waking up from suspend (hardware and/or software?).

The few times I did test it, I never had any problems, though it might be dependent on the situation (nearly all of those times the video card wasn't really being used).

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

nh2 commented 3 years ago

This seems to be fixed for me with 21.05, can anyone confirm?