freebsd / drm-kmod

drm driver for FreeBSD
158 stars 67 forks source link

Scanline flickering with amdgpu and Global C-State Control enabled #318

Open ClausAndersen opened 2 months ago

ClausAndersen commented 2 months ago

I have what I would call intermittent scanline flickering with AMD integrated graphics on the AMD Ryzen 5 5600G and a Asrock DeskMini X300 system. . System works nicely with OtherOS(tm) and to my surprise it seems to be related with C-State. A workaround is to disable Global C-State Control in the BIOS.

It becomes visible when the VT console switches framebuffer driver from efifb to the fb driver during boot. It starts right after the screen clears and I see start FB_INFO:.

VT: Replacing driver "efifb" with new "fb".
start FB_INFO:
type=11 height=2160 width=3840 depth=32
pbase=0xf40cd3000 vbase=0xfffffe01b46d3000
name=drmn0 flags=0x0 stride=15360 bpp=32
end FB_INFO

If I remove amdgpu from kld_list there is no issue. I have tried 3 diffrent HDMI cables just to be sure it should not be a cable issue.

It is intermittent and the frequency it occurs is rather random. Ie. playing video or shaking a window does not make it worse.

The only way I can reliably provoke it is in a VT console when selecting with the mouse pointer on the 2nd half on the screen. The rest of the time it simply occurs randomly. During boot I see it consistently at the line "ELF ldconfig path:".

This is how it looks in the console: https://youtu.be/swA0gvcHGZ8

The issue carries over to X as well: https://youtu.be/76LkOBlXXTw

I have tried manual configuration and setting TearFree and dropped the frequency from 60 to 30 Hz but to no avail.

xrandr --output HDMI-A-0 --set TearFree on
xrandr --output HDMI-A-0 --set EnablePageFlip off

Initial system info:

FreeBSD 14.0-RELEASE-p6

pkg info:
drm-515-kmod-5.15.118_4        DRM drivers modules
drm-kmod-20220907_3            Metaport of DRM modules for the linuxkpi-based KMS components
gpu-firmware-amd-kmod-green-sardine-20230625_2 Firmware modules for green_sardine AMD GPUs
gpu-firmware-kmod-20240401,1   Firmware modules for the drm-kmod drivers
libdrm-2.4.120_1,1             Direct Rendering Manager library and headers
xf86-video-amdgpu-22.0.0_2     X.Org amdgpu display driver

dmesg:
amdgpu: ATOM BIOS: 113-CEZANNE-018
drmn0: successfully loaded firmware image 'amdgpu/green_sardine_*.bin'
[drm] Initialized amdgpu 3.42.0 20150101 for drmn0 on minor 0

The system is currently updated to FreeBSD 14.1 and drm 6.1.92

The system is an Asrock X300 SFF. The workaround is to disable "Global C-State Control" in the BIOS.

The system was running with powerd enabled and -a adaptive. But naively disabling this makes no difference.

The above is a summary of a forum thread

Unsure if this a hardware quirk or related to how FreeBSD is handling C States. Another user reported similar issues with the same GPU which was fixed lowering memory speed. That made no difference for me. I have not seen a lot of reports like this which makes me doubt my hardware. But found it worth reporting as I can reliably replicate the issue when switching C-State control.

ekhramtsov commented 2 months ago

I had the same on -CURRENT and x11-wm/sway.

Try machdep.idle="mwait" in /boot/loader.conf.local, then reboot. May work without rebooting as sysctl machdep.idle=mwait but I'm not sure.

ClausAndersen commented 2 months ago

Brilliant!!!

Sysctl is enough and I can toggle between acpi (default) and mwait. So with Global C-State Control now enabled in the BIOS I have a very easy way to provoke the problem.

I will need to do further testing to see if my system would prefer kern.eventtimer.timer=HPET as well.

The problem is then still that we cannot get to the lower C3 or CC6 states but are doing things the oldfashioned way. NOTE: I have not tried setting it to spin but I would expect the same outcome.

Initially I thought this was clearly iGPU related. But now this looks like a (at least) Ryzen 5 issue. Maybe only for the G and X variants with iGPU?

Even numbered logical CPUs seems to perform better than odd numbered. That might be a culprit for people who experience "stuttering", "jitter" or latency. That might very well be a general problem on my rig but I only noticed it because amdgpu driver so clearly suffers visually because of this. If this is a weakness in the current ścheduler it might be interesting in regard to more modern CPUs with dedicated low power cores.

Andriy Gapon wrote relatively recent in Dec 2021:

I suspect that the hardware + firmware may actually describe that performance disparity via ACPI CPPC (_CPC object, etc), but right now we do not support querying that or making use of it

I have no idea how to conclude if this is a ZEN generational issue. But I have seen a report with discrete AMD graphics which indicates a more general issue.

Then there is the exsistence of zenstates.py. Maybe that is needed for a stable system but my expectation is that things should work out of the box.

I see 6 possible solutions from best to worst

1) Improve scheduler ACPI support likely via ACPI CPPC. This is a level of wizardry which is waaaaaay beyond me. And then most likely a rather delicate operation even for a motivated wizard.

2) In the perfect world we would test for jitter and automagically choose best idle settings during bootstrap. Maybe with a machdep.autoidle knob to twist. This would ensure a more generic approach. I have no clue as to how hard it would be to write the jitter detection code. Nor how big an impact that would have on the bootstrap (hence to option to turn it off)

3) See this as a quirk and mitigate it but changing the default idle from acpi to mwait on select systems. The exact scope remains a little muddy. Is this Zen genrational? Just Ryzen 5? Just with iGPU? Quirks are very common in /sys/dev/acpica/acpi_quirks but seems mostly tied to BIOS vendors/versions. To make it obvious mayve call it ACPI_IDLE_CPPC_NOT_IMPLEMENTED rather than broken. I am unsure if it would be kosher to check for CPU id/vendor here. I am also unsure if idle=acpi is just a default or there is actual detection done. My gut feeling is that this might be better placed in /sys/amd64/amd/initcpu.c or `machdep.c. But my understanding of how those bits are interconnected is quite shallow.

4) See this as an upstream FreeBSD problem and make a quirk in the amdgpu driver to toggle idle mode and output a warning.

5) Do. But just the warning. This give less cruft which is imprtant to remove later.

6) Do nothing and let people Google for this obscure sysctl. In this least effort case it might be helpful to update the wiki which probably get more eyeballs.

I think option 3 is the lowest effort compared to impact. There are several reports if you dig deep enough. The worst part is that many might be affected without knowing.

Am I missing something obvious? Or is this actively worked on elsewhere?

vedranmiletic commented 1 month ago

Just curious, is this also the case with FreeBSD 14.1 and DRM kmod 6.1?

ClausAndersen commented 1 month ago

As stated: The system is currently updated to FreeBSD 14.1 and drm 6.1.92

The workaround is tested and works with 14.1 and 6.1. I suspect it would be the same if I downgraded to 5.15 but I have not tried this yet.

This is seems to be some sort of ACPI issue. Still investigating so I am unsure if I should blame yet another buggy BIOS or if FreeBSD can be blamed for lacking CPPC support.