NVIDIA / egl-wayland

The EGLStream-based Wayland external platform
MIT License
293 stars 47 forks source link

Xwayland VRAM usage is still excessive when resizing X11 apps under wayland. #126

Open shelterx opened 1 month ago

shelterx commented 1 month ago

I'm not sure what the "Fix an issue causing KDE crashes, which also caused excessive VRAM usage when resizing." was supposed to fix. Resizing X11 apps like steam still makes Xwayland VRAM usage skyrocket but seems to stop at around 1.3GB. I'm not sure exactly what component causes this but I'll leave it here.

ofourdan commented 1 month ago

For background, that was already reported against Xwayland here: https://gitlab.freedesktop.org/xorg/xserver/-/issues/1687

Bunnysword commented 1 month ago

Please fix: https://forums.developer.nvidia.com/t/560-release-feedback-discussion/300830/165?u=nicneme123

thesword53 commented 1 month ago

This issue is not limited to Xwayland:

shelterx commented 1 month ago

@thesword53 indeed... you are correct, how did I miss that. I resized Konsole and here's the result: image

Good find!

Version used: Driver: 560.31.02 egl-wayland-https://github.com/NVIDIA/egl-wayland/commit/f30cb0e4c9a215e933dc8250f5dad4e96d4f2136

shelterx commented 1 month ago

This issue is not limited to Xwayland:

It's not limited to just Wayland session either, kwin_x11 also eats VRAM when resizing. I don't recall having that issue before. So it's probably not an egl-wayland bug at all, I'll leave the issue open until it's fixed tho'.

However, kwin_x11 does release the memory after a while.. but it does it slow.

Arcitec commented 1 week ago

I can confirm, I went back to X11. On Fedora Workstation 40 with NVIDIA 560.35.03 and a RTX 3090.

On Wayland my average desktop uses 11 GB / 24 GB VRAM (46%) with just a web browser open. It impairs my ability to run games or AI workloads, because basically half the card's memory is wasted. One time it even reached the point where all apps crashed because VRAM ran out.

On X11 my average desktop uses 3 GB / 24 GB VRAM (12.5%) for the same workload. Games and AI workloads run great.

The issue seems to be:

This is in addition to Wayland's other issues, such as Chromium-based browsers frequently breaking when opening new windows, causing the windows to render in a glitched way and offset by about a titlebar's height from the top of the screen, and you have to click and drag the "invisible" (totally transparent) titlebar to resize the window to get it to render properly.

And Wayland's lack of basic features such as global keyboard shortcuts/keybinds.

It's not just NVIDIA that has problems on Wayland. Most things do.

I am going back to X11 for the next 12 months and will see if Wayland is better in 2026. At least X11 is usable. :D Wayland needs more time in the bakery. Fedora plans to remove X11 by default in Fedora 41, but I'll just install it manually since Wayland is totally unusable at the moment.

ryzendew commented 1 week ago

I have fully tried to reproduce this issue to no Avail If anyone has a 100% certain way please let me know. Tried on arch and fedora and Pikaos 4

kelvie commented 1 week ago

I can reproduce this pretty readily on KDE on arch, running nvidia 560.35.03. Just open a konsole and move resize it a bunch of times, run nvidia-smi and notice that the VRAM of kwin_wayland will go up to about 10% of the total VRAM, and kind of stop there.

Perhaps this does have to do with how nvidia is allocating, or garbage collecting, or perhaps even reporting VRAM.

shelterx commented 6 days ago

I have fully tried to reproduce this issue to no Avail

https://streamable.com/2ufy13

This sort of demonstrates the issue, watch kwin VRAM usage after the resizing.

ryzendew commented 5 days ago

ok confirmed it's a thing on gnome as well A friend on a 7800XT confirmed it happens on amd as well

adding a video https://streamable.com/ht9cu2

kelvie commented 5 days ago

This has always happened for me on xwayland (I filed that bug on FDO)but only recently for kwin_wayland, anyone know if there's an Nvidia version/egl-wayland version combo that doesn't have this problem and has explicit sync?

shelterx commented 5 days ago

@kelvie are you sure? It's possible it's been like that for a while. But it's easy to miss that it happens with kwin_wayland, because if you close the window that made the vram leak. Kwin vram usage goes back to normal.

Update Here's the 550.40.71 dev driver, so yeah, it's in 550 too. Image

kelvie commented 5 days ago

@shelterx

https://gitlab.freedesktop.org/xorg/xserver/-/issues/1617

This has happened since 545.29.06 with xwayland, I had to switch all my apps to wayland native apps to combat this if I wanted my vram.

Only with 560 it started happening with kwin_wayland for me (I also had a short stint with hyprland so I don't remember when I switched back), and currently with 560, as far as I can tell, kwin_wayland never gives back the memory even after I close the windows.

ryzendew commented 5 days ago

https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/1704 let's test this

ryzendew commented 4 days ago

After testing that PR the issue is semi fixed https://gitlab.freedesktop.org/-/project/371/uploads/4af729a970faa28b667669bac1b8531f/Screencast_From_2024-09-25_20-48-02.mp4 here is a video

gilvbp commented 4 days ago

After testing that PR the issue is semi fixed https://gitlab.freedesktop.org/-/project/371/uploads/4af729a970faa28b667669bac1b8531f/Screencast_From_2024-09-25_20-48-02.mp4 here is a video

FYl. This is not a fix. It's only helps to debug/track/trace.

kelvie commented 4 days ago

Yeah, I went back and tested with 555 and 550 as well, and still the same thing, kwin_Wayland using 2.4 to 2.7GB of my 24GB vram after resizing windows, and not freeing it even after windows are closed.

kelvie commented 4 days ago

I've started a new topic here: https://forums.developer.nvidia.com/t/multiple-wayland-compositors-not-freeing-vram-after-resizing-windows/307939

There are multiple issues here (Xwayland, compositors) and multiple components (xorg-server, multiple wayland compositors, this repo, nvidia drivers), so hopefully we get to the bottom of this.

In summary, I've reproduced this on:

nvidia versions:

compositors:

egl-wayland versions:

with the same test, open a terminal and resize it over and over again, close it, and check the compositor's VRAM usage using nvidia-smi

Every time it's around 2.5GB on my 24GB 4090

shelterx commented 4 days ago

I had an old install with 525 and KDE 5, can't say I managed to reproduce it there but I had no Wayland session installed so I had to rely on the X11 test.

shelterx commented 3 days ago

So...

kelvie commented 3 days ago

@shelterx Wow that's a trippy workaround (the minimizing one for kwin_wayland), it does seem to work, I wonder the e(gl) calls that are at work here. plasmashell doesn't seem to free it's vram, but maybe that's another issue.

kelvie commented 3 days ago

I'm testing this a bit more, and it seems just using the Alt+TAB switcher in kwin resets the VRAM -- very strange. Maybe something to do with how the window thumbnails are being created for that?

cubanismo commented 3 days ago

Thank you for all the reports and attempts to narrow down the issue. I believe there are actually two separate issues tracked here:

I've looked into the latter issue, and at this point it is well understood. We do not need additional information or reports of reproductions for that issue. See below for more information.

We have not been able to reproduce the issues with Xwayland/X applications with the latest version of Xwayland and latest drivers. If you are still experiencing that particular issue, please share reproduction steps (ideally starting from a clean boot), the amount of persistent memory usage you are seeing and how you are measuring it, and your system details (Run nvidia-bug-report.sh, attach the log it generates, list your Xwayland and compositor version numbers and ideally distro package versions if you're using distro packages).

For the Wayland compositor memory usage issue, there isn't a leak per-se, but the heuristics that decide which memory to retain for performance reasons aren't working optimally when presented with the OpenGL API usage typical of a Wayland compositor. While we work to develop and deploy a driver fix, I can offer this workaround:

That should resolve this class of memory usage issues within the named application. You can also duplicate the entire rule in the JSON file if you regularly switch between multiple Wayland compositors, e.g:

        {
            "pattern": {
                "feature": "procname",
                "matches": "kwin_wayland"
            },
            "profile": "Limit Free Buffer Pool On Wayland Compositors"
        },
        {
            "pattern": {
                "feature": "procname",
                "matches": "gnome-shell"
            },
            "profile": "Limit Free Buffer Pool On Wayland Compositors"
        }
kelvie commented 3 days ago

Thank you for this! And to be clear, we are to create a .txt file filled with JSON, and not a .json file in that directory?

Edit: Just tested the instructions as is, copied that file and placed it in the directory with a .txt extension and it's fixed! thank you!

cubanismo commented 3 days ago

We are to create a .txt file filled with JSON, and not a .json file in that directory?

The driver doesn't care what the file is called. I didn't have an extension on the file originally, but github only accepts certain file names (See https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/attaching-files), so I renamed it .txt. Name it whatever you like.

shelterx commented 3 days ago

@cubanismo Thank you for your reply, much appreciated. I will try the kwin/gnomeshell workaround. I think it happens with plasmashell too tho'.

We have not been able to reproduce the issues with Xwayland/X applications with the latest version of Xwayland and latest drivers.

Not sure which driver you are referring as latest but I can't reproduce it with the latest dev drivers, however with 560.35.03 it's easy. Just resize the steam window for example.

Edit: The workaround for kwin_wayland works here. Additonal info, I see no VRAM spikes in KDE like @mlhhqh experienced in gnome, mentioned below.

mlhhqh commented 3 days ago
    {
        "pattern": {
            "feature": "procname",
            "matches": "kwin_wayland"
        },
        "profile": "Limit Free Buffer Pool On Wayland Compositors"
    },
    {
        "pattern": {
            "feature": "procname",
            "matches": "gnome-shell"
        },
        "profile": "Limit Free Buffer Pool On Wayland Compositors"
    }

Can confirm works on Gnome 46, Silverblue, 560.35.03

Still very subpar results. After opening a gnome session opening a terminal (Wayland) and resizing it around a bit usage spikes up to 1.4GB (from ~300mb). Vram usage goes down very slowly (yet noticeably on user interaction like moving a window)

ppogorze commented 3 days ago

@cubanismo works for me! Gnome 47, CachyOS (Arch), 560.35.03. VRAM usage stays at ~400MB while resizing terminal window with nvtop open (before it was up to 1.4GB).

shelterx commented 3 days ago

FYI, you can also add kwin_x11 if you use Xorg, it makes kwin_x11 stay on sane levels and doesn't overallocate.

kakra commented 2 days ago

FYI, you can also add kwin_x11 if you use Xorg, it makes kwin_x11 stay on sane levels and doesn't overallocate.

Yeah, and while at it, add plasmashell, too. plasmashell is happy with under 300MB now instead of climbing up above 700MB. I also added the Xorg process itself. Not sure if it helps, seems to be a little lower (maybe 100-200MB less).

kelvie commented 2 days ago

FYI, you can also add kwin_x11 if you use Xorg, it makes kwin_x11 stay on sane levels and doesn't overallocate.

Yeah, and while at it, add plasmashell, too. plasmashell is happy with under 300MB now instead of climbing up above 700MB. I also added the Xorg process itself. Not sure if it helps, seems to be a little lower (maybe 100-200MB less).

We may want to start a separate topic (discussion?) on this. I've also added plasmashell and haven't noticed a difference (still uses 900MB)

Edu4rdSHL commented 2 days ago

This works for process started specifically with these names (the compositor ones), but most times the “leak” occurs on another apps (electron apps are a clear example) when you resize them and the memory never gets free-ed.

It's a really nice improvement, but not yet a solution because you will need to add a pattern that does match every process name where you want to perform that operation, which can be from a couple to dozens of apps.

Anyway, thanks again for it.

kelvie commented 2 days ago

It's a really nice improvement, but not yet a solution because you will need to add a pattern that does match every process name where you want to perform that operation, which can be from a couple to dozens of apps.

@cubanismo did say that they're working on the fix on the driver side, as for the workaround, searching for nvidia-application-profiles-rc.d, I ran across this:

https://download.nvidia.com/XFree86/Linux-x86/384.59/README/profiles.html

So a rules file like this would apply to all processes:

{
    "rules": [
        {
            "pattern": {
                "feature": "true",
                "matches": "foobar"
            },
            "profile": "Limit Free Buffer Pool On Wayland Compositors"
        }
    ],
    "profiles": [
        {
            "name": "Limit Free Buffer Pool On Wayland Compositors",
            "settings": [
                {
                    "key": "GLVidHeapReuseRatio",
                    "value": 1
                }
            ]
        }
    ]
}

And I tested this, with it applied, losslesscut, an electron app I use, can now be resized without holding on to so much VRAM. However as @cubanismo mentions, this behaviour is like that by default for performance reasons, so presumably the tradeoff here is performance of some sort, and I won't speculate on that as I've lived through multiple decades of "self reported performance tips and tricks".

Edu4rdSHL commented 2 days ago

@cubanismo did say that they're working on the fix on the driver side, as for the workaround, searching for nvidia-application-profiles-rc.d, I ran across this: https://download.nvidia.com/XFree86/Linux-x86/384.59/README/profiles.html

Yup, I found that too, I have explicitly added all the apps that I most use there, excluding apps that rely merely on graphics (games, more exactly) to avoid possible drawbacks. It seems to be working fine, glad to see that the root problem has been found.

Edu4rdSHL commented 2 days ago

@cubanismo as for the "Xwayland" issue, it's mostly the same as the kwin one, see the following video for steps to reproduce:

TLDR: resize a window under wayland/xwayland several times and see how memory goes up. It's also solved if you match the process name and add the workaround you posted before, so I think it's related.

https://github.com/user-attachments/assets/fc462eba-335a-4e4f-a12e-e8874b5dbfa0

Credits of the video: https://forums.developer.nvidia.com/t/560-release-feedback-discussion/300830/165?u=edu4rdshl

shelterx commented 2 days ago

Hold your horses now everyone, I am going to keep this issue open until everything is fixed. I got a bit sidetracked myself here but now we have a workaround for kwin & various applications thanks to @cubanismo stepping in here.

If you want to discuss application workarounds and whatnot i suggest you to do that on the Nvidia forums.

The issue is really about the Xwayland VRAM allocation issue. So stay on that topic from now on. Thank you.