ValveSoftware / SteamOS

SteamOS community tracker
1.52k stars 69 forks source link

PATCH FIX INCLUDED: Steam Deck does not work when connecting to multiple monitors (displays) via a Dock on Linux 5.18-rc1 and higher #1399

Open deftdawg opened 4 months ago

deftdawg commented 4 months ago

2024-03-24 - UPDATE: I have bisected the issue to a kernel commit made to 5.18-rc1, please see the comment below https://github.com/ValveSoftware/SteamOS/issues/1399#issuecomment-2016698201


The Steam Deck doesn't work in Game Mode when docked to multiple displays under Linux 6.1 and higher. This issue also prevents the Steam Deck from being usable for productivity tasks with multiple displays in Desktop Mode.

The Deck does work perfectly with the same dock on Windows 10, 11, and any distro running Linux 5.15.

Your system information

Operating System: SteamOS 3.5.15 KDE Plasma Version: 5.27.5 KDE Frameworks Version: 5.107.0 Qt Version: 5.15.9 Kernel Version: 6.1.52-valve16-1-neptune-61 (64-bit) Graphics Platform: X11 Processors: 8 × AMD Custom APU 0405 Memory: 14.5 GiB of RAM Graphics Processor: AMD Custom GPU 0405 Manufacturer: Valve Product Name: Jupiter System Version: 1

Please describe your issue in as much detail as possible:

Under Linux 6.1 and higher, the Steam Deck is unable to use both displays properly, either the Graphics Platform freezes until one or both of the displays is disconnected or the displays operate at reduced refresh and resolution or the display glitch either skewing picture or partially duplicating elements on multiple screens or show menus and widgets on the the wrong display (i.e. start menu opens on a different display than task bar)

I have a usb-c dock that has HDMI and DisplayPort outputs on it.
These output are connected to a 4K@60hz LG monitor and a Samsung 2K@60hz display respectively.

Under Windows 10 or 11 this docks is able to use both displays at their full resolution and refresh rates.

Under a Linux 5.15, the Steam Deck is also able to use both displays at their full resolution and refresh rates.

I suspect has changed with the AMD kernel module since 5.15 that has caused multiple displays to break.

Steps for reproducing this issue:

  1. Get a Dock with an HDMI and DisplayPort, Steam Dock or other brand
  2. Connect Dock to Displays on both outputs and connect the Steam Deck to the dock
  3. Start the Deck

Game mode won't work correctly, neither will desktop mode unless only one external display is connected as of 6.1.52-valve16-1-neptune-61 kernel.

Results of Testing with Various Live ISOs

These were tested on the Steam Deck using Ventoy on the SD card.

Status OS Kernel Graphics Platform Notes
Works Windows 11 To Go (SD Card) N/A Good
Works Waydroid Beta Live 5.15 Wayland Good
Works NixOS Plasma5 Live ISO 5.15 Wayland Good
Works NixOS Plasma5 Live ISO 5.15 X11 Good
Fails NixOS Plasma5 Live ISO 6.1 X11 Glitchy
Fails NixOS Plasma5 Live ISO 6.1 Wayland Glitchy
Fails SteamOS 3.5.15 6.1.52-valve16-1-neptune-61 (64-bit) X11 Can only run:
1x 4K@60hz HDMI display (with other displays disconnected)
Fails NixOS Plasma5 Live ISO 6.7.4 Wayland Multi-displays = Back screen on startup, Disconnect / reconnect -> Glitchy, wrong resolutions, UI elements open on wrong display
Fails KDE Neon Unstable Plasma 6.1 ISO 6.5.0-17-generic Wayland Multi-displays = Back screen on startup, Disconnect / reconnect -> Glitchy, wrong resolutions, UI elements open on wrong display
Fails Nobara 39 Live ISO 6.7.0-204.fsync.fc39.x86_64 (64-bit) Wayland [drm] Failed to add display topology, DTM TA is not initialized.
Fails NixOS Plasma5 Live ISO 6.7.6-xanmod X11 Glitchy
Fails NixOS Plasma5 Live ISO 6.8.0-rc5 X11 Glitchy
deftdawg commented 4 months ago

Issue seems to be related to these https://gitlab.freedesktop.org/drm/amd/-/issues/2171 (closed) https://gitlab.freedesktop.org/drm/amd/-/issues/2680

wvffle commented 4 months ago

I can confirm the issues with an official dock when trying to connect a single 3440x1440@160 display over usb-c or DP.

deftdawg commented 4 months ago

Opened https://gitlab.freedesktop.org/drm/amd/-/issues/3234 with more details about same behaviour with 6.7.6 and 6.8-rc5 Linux kernels.

cpelley commented 4 months ago

Thanks for raising this. Things haven't seemed to have been right since perhaps the steamos 3.5?? I use to get along OK (though was sometimes hit and miss) with my setup of steam deck + official dock, one HDMI and one DP to HDMI active (I.e. multi monitor). Using my steam deck as my main machine so this is rather important for me. Since some update (I think around the 3.5??) it stopped working for the DP. In fact the HDMI display will not work unless I disconnect the DP cable. When using the dock to my work (windows) laptop, it continues to function multi display as normal.

Thanks again for raising.

deftdawg commented 4 months ago

With initial releases of SteamOS for Steam Deck that were on 5.13(?) I was able to get both Displays to work, but a reduced resolution / refresh rates (HDMI at 4K@30 + DP at 720p@60 - Samsung screen that needs 60hz)... may have worked for you with lower than optimal refresh rates on those earlier releases.

deftdawg commented 4 months ago

2024-03-11 Update

Newest known good is Nitrux 2.2.0 ISO w/ 5.17.12 (no amdgpu commits since .10) https://osdn.ip-connect.vn.ua/nitrux/77377/nitrux-nx-desktop-20220602-amd64.iso

Oldest known bad is KaOS 2022.06 ISO w/ 5.17.15 (3 amdgpu commits) https://master.dl.sourceforge.net/project/kaosx/Archive/KaOS-2022.06-x86_64.iso?viasf=1

Looking to find or build a 5.17.14 ISO as there were 5 amdgpu commits in .14 that could be the origin if it didn't originate in .15.

cpelley commented 4 months ago

With initial releases of SteamOS for Steam Deck that were on 5.13(?) I was able to get both Displays to work, but a reduced resolution / refresh rates (HDMI at 4K@30 + DP at 720p@60 - Samsung screen that needs 60hz)... may have worked for you with lower than optimal refresh rates on those earlier releases.

Lat night my two displays by some miracle worked in desktop mode (HDMI + DP/HDMI adaptor). Both displays are 1080p@60Hz. This is the first time in months. Time will tell whether it continues to work or whether it was a one-off (will look tonight after I finish work).

I downgraded the docks firmware and upgraded it etc. Nothing seemed to work but suddenly the display port to HDMI adaptor connected display turned on (I have no idea why). Then after switching between various resolutions and refresh rates on the HDMI connected monitor, that display finally turned on too.

Is this issue related to this one? https://github.com/ValveSoftware/SteamOS/issues/1136

deftdawg commented 4 months ago

I downgraded the docks firmware and upgraded it etc. Nothing seemed to work but suddenly the display port to HDMI adaptor connected display turned on (I have no idea why). Then after switching between various resolutions and refresh rates on the HDMI connected monitor, that display finally turned on too.

I can still get my HDMI + DP monitors to run at half their bandwidth with the latest SteamOS... In my case the only thing that has worked at full resolution + refresh consistently is running 5.15 kernels...

Yesterday something interesting happened, I ran my "earliest known bad kernel" 5.17.15 on KaOS with a drm.debug=0x156 parameter (when grub comes up, edit the command line, append it to the linux line, ctrl+x to boot) the AMD dev gave me to collect logs and instead of failing the system worked just as it had before on everything between 5.15 and 5.17.12...

I'll have to see if I can test that some more on Friday, but that might mean it's some other kernel optimisation between 5.17.12 and 5.17.15 that has caused a timing issue that gets resolved temporarily by slowing the kernel down to have it spam extra dmesg debug...

I'm not sure if there's a way to get at the grub menu on steamOS, but if there is you could try adding that parameter next time it fails.

deftdawg commented 3 months ago

2024-03-24 Update - Reverse Patch that Fixes issue

Over the past the past month I've taught myself enough Nix to be able to get a flake working that is able to generate 5.x series kernel ISOs and bisected the issue down to this commit:

https://github.com/torvalds/linux/commit/c5365554514

If I revert the commit with the patch, I'm able to get full function on my 2 external displays using Linux 6.8.0-rc6...

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 0e58c1ab414c..d24be9fb5845 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -911,31 +911,22 @@ static bool is_dsc_need_re_compute(
        struct dc_state *dc_state,
        struct dc_link *dc_link)
 {
-       int i, j;
+       int i;
        bool is_dsc_need_re_compute = false;
-       struct amdgpu_dm_connector *stream_on_link[MAX_PIPES];
-       int new_stream_on_link_num = 0;
-       struct amdgpu_dm_connector *aconnector;
-       struct dc_stream_state *stream;
-       const struct dc *dc = dc_link->dc;

-       /* only check phy used by dsc mst branch */
+       /* only check phy used by mst branch */
        if (dc_link->type != dc_connection_mst_branch)
                return false;

-       if (!(dc_link->dpcd_caps.dsc_caps.dsc_basic_caps.fields.dsc_support.DSC_SUPPORT ||
-               dc_link->dpcd_caps.dsc_caps.dsc_basic_caps.fields.dsc_support.DSC_PASSTHROUGH_SUPPORT))
-               return false;
-
-       for (i = 0; i < MAX_PIPES; i++)
-               stream_on_link[i] = NULL;
-
        /* check if there is mode change in new request */
        for (i = 0; i < dc_state->stream_count; i++) {
+               struct amdgpu_dm_connector *aconnector;
+               struct dc_stream_state *stream;
                struct drm_crtc_state *new_crtc_state;
                struct drm_connector_state *new_conn_state;

                stream = dc_state->streams[i];
+
                if (!stream)
                        continue;

@@ -947,10 +938,8 @@ static bool is_dsc_need_re_compute(
                if (!aconnector)
                        continue;

-               stream_on_link[new_stream_on_link_num] = aconnector;
-               new_stream_on_link_num++;
-
                new_conn_state = drm_atomic_get_new_connector_state(state, &aconnector->base);
+
                if (!new_conn_state)
                        continue;

@@ -961,6 +950,7 @@ static bool is_dsc_need_re_compute(
                        continue;

                new_crtc_state = drm_atomic_get_new_crtc_state(state, new_conn_state->crtc);
+
                if (!new_crtc_state)
                        continue;

@@ -970,34 +960,7 @@ static bool is_dsc_need_re_compute(
                if (new_crtc_state->enable && new_crtc_state->active) {
                        if (new_crtc_state->mode_changed || new_crtc_state->active_changed ||
                                new_crtc_state->connectors_changed)
-                               return true;
-               }
-       }
-
-       /* check current_state if there stream on link but it is not in
-        * new request state
-        */
-       for (i = 0; i < dc->current_state->stream_count; i++) {
-               stream = dc->current_state->streams[i];
-               /* only check stream on the mst hub */
-               if (stream->link != dc_link)
-                       continue;
-
-               aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context;
-               if (!aconnector)
-                       continue;
-
-               for (j = 0; j < new_stream_on_link_num; j++) {
-                       if (stream_on_link[j]) {
-                               if (aconnector == stream_on_link[j])
-                                       break;
-                       }
-               }
-
-               if (j == new_stream_on_link_num) {
-                       /* not in new state */
-                       is_dsc_need_re_compute = true;
-                       break;
+                               is_dsc_need_re_compute = true;
                }
        }

Now just need to get Valve and AMD to test/push this upstream to undo this bug.

cpelley commented 3 months ago

On the off-chance that you don't know and in case your comment meant to raise awareness with developers of the linux kernel. I think this repository (https://github.com/torvalds/linux) is a mirror, so development doesn't actively take place there. See the following bot comment if that's of use: https://github.com/torvalds/linux/pull/800#issuecomment-587391042 I'm sure you know already...

Thanks for going the extra mile on this! ❤️

deftdawg commented 3 months ago

Yep, thanks I know, just wanted to flag that commit so others might see it in Google or whatever. The commit signer who I pinged has pinged a couple other AMDers (on the AMD gitlab).

Better AMD offer a fix for it than me go on the kernel mailing list trying to ask for a commit I don't understand/can't debug should be reverted. 😂

parkerlreed commented 3 months ago

@deftdawg Trying to apply this against 6.8.1 and it's not liking the patch. I tried with and without the newline at the end of the file

[parker@rogally linux-6.8.1]$ git apply dsc.patch
error: corrupt patch at line 97
[parker@rogally linux-6.8.1]$ patch -p1 < dsc.patch 
patching file drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
Hunk #1 FAILED at 911.
Hunk #2 FAILED at 947.
Hunk #3 FAILED at 961.
Hunk #4 FAILED at 970.
4 out of 4 hunks FAILED -- saving rejects to file drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c.rej
deftdawg commented 3 months ago

@parkerlreed the commit is from 2 years ago, you may have to apply some "--fuzz" lines to your "patch" command...

Maybe try --fuzz 100; Nix applied the patch for me, not sure how much fuzz it uses.

parkerlreed commented 3 months ago

Thanks that got it closer but only one applied

[parker@rogally linux-6.8.1]$ patch -p1 --fuzz 100 < dsc.patch 
patching file drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
Hunk #1 FAILED at 911.
Hunk #2 FAILED at 947.
Hunk #3 succeeded at 961 with fuzz 3.
Hunk #4 FAILED at 971.
3 out of 4 hunks FAILED -- saving rejects to file drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c.rej
parkerlreed commented 3 months ago

For reference trying to test this patch to see if it helps this issue. Plasma Wayland specifically won't initialize all three displays over MST+DSC (at their full res) but somehow X11 works. https://gitlab.freedesktop.org/drm/amd/-/issues/3278

deftdawg commented 3 months ago

Either more fuzz like 5000 😄 or manually compare the drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c against the version of the file tagged at v5.18 to see what's changed and adjust it by hand.

parkerlreed commented 3 months ago

Either more fuzz like 5000 😄 or manually compare the drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c against the version of the file tagged at v5.18 to see what's changed and adjust it by hand.

Yeah tried that earlier, no luck lol

You said Nix applied it successfully against 6.8.0-rc? Do you still have the drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c file (could try to transplant that from yours)

deftdawg commented 3 months ago

Can't find it atm, if you want you can try this one, think its from 5.18 before the rc1 bad commit... no idea if it will work (so back up your original). amdgpu_dm_mst_types.c.txt

I can see if I can recreate it tomorrow.

deftdawg commented 3 months ago

AMD engineer suggested based on my bisect that Display Stream Compression (DSC) was broken, that aligns with my experience of only being able to run my 2 displays (2K@60 + 4K@60) together with >=6.1 at a max of 720p@60 and 4K@30 ...

Running them at 2K@60 + 4K@60 would exceed the available bandwidth without DSC bork everything.

You can try switching to much lower res and/or refresh rates to see if you can get your bandwith below the non-DSC max.

parkerlreed commented 3 months ago

That's the oddball for me.

My setup is 3 1440p at 75 Hz.

Steam Deck and ROG Ally both support full resolution across the three only in X11 (DSC working and functional)

Wayland acts like DSC isn't working and I have to lower the 2 side monitors so the total is

The logs I posted over on the Gitlab show some kernel module "crashes" related to the DSC calculation.

parkerlreed commented 3 months ago

It seems as noted on the Gitlab issue, my DSC is working but Plasma Wayland is not supporting it. Just tried Gnome and everything works at full res on Wayland.

deftdawg commented 3 months ago

After reversing the commit or running Linux 5.15.x LTS, I can run both screens at full res/refresh (which needs DSC) with Wayland, though my screens all cap out at 60hz at 2K/4K... Including my 4K@144hz screen.

parkerlreed commented 3 months ago

Are you running Plasma Wayland?

Turns out my issue was the Plasma Wayland session will default to 10-bit per pixel and there's a bug with the AMD GPU driver that doesn't correctly advertise that it can't do that with DSC.

Telling Plasma to use 8-bit per pixel lets my full setup work at the native res and refresh rate.

KWIN_DRM_PREFER_COLOR_DEPTH=24 set for the environment works for me.

https://gitlab.freedesktop.org/drm/amd/-/issues/2598

deftdawg commented 3 months ago

SteamOS ships with X11/plasma5

deftdawg commented 1 month ago

Official fixes from AMD merged into Linux 6.9.x mainline on May 10th:

https://github.com/torvalds/linux/commit/cf87f46fd34d6c19283d9625a7822f20d90b64a4

So this will get fixed on SteamOS when valve ships 6.9 or cherry picks the fixes.