LizardByte / Sunshine

Self-hosted game stream host for Moonlight.
http://app.lizardbyte.dev/Sunshine/
GNU General Public License v3.0
19.03k stars 924 forks source link

AMF Encoder on Radeon 5000 and newer overshoots encoding target at higher constant bitrates #1040

Closed emkultra64 closed 6 months ago

emkultra64 commented 1 year ago

Is there an existing issue for this?

Is your issue described in the documentation?

Describe the Bug

At high bitrate settings, Sunshine's AMF hardware encoding usually overshoots the requested encoding and sends frames too large for error correction or a stable connection. This issue occurs regardless of whether h264 or HEVC is being used, and regardless of what the AMD Quality setting is.

When this happens, the following error appears repeatedly in the logging/console output, with varying numbers for the DATA_SHARDS_MAX figure before the > sign: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 483 > 255, skipping error correction

This issue will often result in severe stuttering or freezes and audio glitches at best, while at worst it will cause outright disconnects due to an error for the bitrate being "too high" for the connection, even on a local network where all devices are connected via gigabit ethernet with optimal network throughput.

This issue can be reproduced by setting the rate control to cbr on the host, then selecting the highest bitrate possible in Moonlight on a client device. (150 Mbps)

The following workarounds exist:

A few other users of RDNA GPUs have experienced this issue, notably: #222 I have noticed this happening on my host machine, which currently has an RX 6650 XT installed.

Expected Behavior

This issue does not occur with Nvidia GPUs and NVENC with constant bitrate settings of 150 mbps, nor does it occur when using AMF on an AMD APU such as the Ryzen 5 5600G while it's passing through an Nvidia GPU and encoding its output. I was able to test a GTX 1070 this way and found no issues using cbr while encoding with either integrated graphics AMF encoder nor the dGPU's NVENC encoder. Encoding with AMF on RX 5000 series and later hardware, it seems, is the sole situation in which this issue occurs.

Additional Context

This has been tested across clients on multiple platforms (macOS, Linux and Android) as well as at multiple resolutions (1080p as well as 1440p) and even with different encoders (both amf_h264 and amf_hevc.) It will occur in both cbr and vbr_peak encoder modes, and to a lesser (but still significant) extent in cqp mode. It occurs regardless of whether the client device is receiving the stream via ethernet or wi-fi.

Enabling or disabling AMF Preanalysis or AMF Variance Based Adaptive Quantization (VBAQ) in the encoder settings does not affect the issue.

Host Operating System

Windows

Operating System Version

Windows 10 22H2

Architecture

64 bit

Sunshine commit or version

0.18.4

Package

Windows - installer

GPU Type

AMD

GPU Model

RX 6650 XT

GPU Driver/Mesa Version

23.3.1

Config

upnp = enabled
amd_coder = cabac
gamepad = x360
qsv_coder = auto
resolutions = [
    352x240,
    480x360,
    858x480,
    1280x720,
    1920x1080,
    2560x1080,
    3440x1440,
    1920x1200,
    3860x2160,
    3840x1600,
    1600x900
]
amd_usage = ultralowlatency
origin_pin_allowed = pc
key_rightalt_to_key_win = disabled
vt_realtime = enabled
vt_software = auto
origin_web_ui_allowed = lan
amd_quality = quality
min_log_level = 2
amd_vbaq = enabled
fec_percentage = 20
qp = 20
vt_coder = auto
dwmflush = enabled
hevc_mode = 0
amd_rc = cbr
amd_preanalysis = enabled
fps = [10,30,60,90,"75","120"]
qsv_preset = medium

Apps

No response

Relevant log output

Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 263 > 255
psyke83 commented 1 year ago

These symptoms also occur on my RX 570 and RX 6600. For these cards, the vbr_latency rate control seems to be the only usable option for streaming, hence my rationale to set it as the default.

Forcing hypothetical reference decoder compatibility (via the enforce_hrd ffmpeg encoder tunable) does seem to resolve the peak bandwidth issue for the other rate control modes, but - IIRC - enabling it would cause the HEVC encoder to throttle downwards to 30fps on my RX 570 when encoding at 1080p or higher.

entropicdrifter commented 1 year ago

I think I may be facing this issue with my 6700XT using VA-API on Linux.

Whenever I stream with my Raspberry Pi or Steam Link, I'll see corruption every once in a while that I never see with other decoders or with the software encoder, my old Nvidia GPU's encoder, or my Intel iGPU encoder. I think this is due to bitrate spikes because the key factor the Pi 3 and Steam Link have in common is their low maximum decoding bitrates.

Are there any config settings I could try altering to get this under control?

emkultra64 commented 1 year ago

Are there any config settings I could try altering to get this under control?

I'm sorry to say VA-API being an entirely different encoder and the host PC's operating system also being different, I'm not entirely sure the issue is the same as the bug described here. Not sure what's needed for your particular setup but LizardByte does have a Discord link on their support page so you might want to hop in there to get a little further help.

NightHammer1000 commented 1 year ago

I also have this kind of overshoot on my 7900XTX with vbr_latency enabled.

I noticed it. While normal Gameplay the Task Manager shows around 20-50 MB/s transfer. But the Stream will randomly freeze up and Network Traffic will shoot up to 200-220 MB/s for Sunshine.

Resolution is 1920x1200, 60FPS and 55 Mbps Bitrate

LizardByte-bot commented 1 year ago

This issue is stale because it has been open for 90 days with no activity. Comment or remove the stale label, otherwise this will be closed in 10 days.

MarhyCZ commented 1 year ago

I still experience the same symptoms on a Sunshine 0.20 version using 6900 XT. @psyke83 How could I enable the enforce_hrd ffmpeg flag using Sunshine?

ns6089 commented 1 year ago

I also have this kind of overshoot on my 7900XTX with vbr_latency enabled

Don't use vbr, for real. It's downright detrimental to quality without lookahead, and simply impossible with single frame vbv(hrd) - or in other words, without bitrate peaks.

How could I enable the enforce_hrd ffmpeg flag using Sunshine?

Not without recompiling, need to add line here https://github.com/LizardByte/Sunshine/blob/master/src/video.cpp#L566

MarhyCZ commented 1 year ago

@ns6089 Thanks a lot for quickly pointing it out where it is, thats gonna save me a lot of time! I will try to recompile it.

MarhyCZ commented 1 year ago

@ns6089 I can confirm that with enabling the "enforce_hrd" and using AMD AMF constant bitrate option, the quality of the stream is much higher.

che666 commented 1 year ago

@MarhyCZ @ns6089 Could you please share the patch or create a pull request? Thank you very much!

justinharrell commented 1 year ago

Seeing this behavior as well on a 6700 XT. Just upgraded from a 1070 which I ran at 150 MBps CBR H.264 wired ethernet no issue seemed to actually hit around 120 MBps at 60fps 4k.

On the AMD card with CBR at 150 will get huge stutters with large overshoots into the 250+ MBps range with the performance overlay showing dropped frame due to network and the big red network too slow error, completely unusable. As you crank the bit rate down it gets better but is never completely smooth with occasional large over shoots and dropped frames all the down through 50 MBps, definitely not a network issue as reported by moonlight.

vbr_latency is better but the dropped frames don't go away until around 60 MBps, currently running 50 MBps HVEC which works very well and seems to handle 4k 60fps with no quality issues I can see and very good latency.

Not a huge problem now that I figured out acceptable settings, just very different from the Nvidia side where I could just crank it up to max without issue for wired ethernet, almost thought the AMD card was not going to work for sunshine.

CypherGrue commented 1 year ago

I can reproduce this issue on Ubuntu 22.04 Linux 6.5.5 Mesa 23.2.1 with both an old R9 Nano (Fiji GCN 3) and a new RX7600 (RDNA 3) streaming 1440p@60 over gigabit. The software encoder works fine but heats up the CPU. The h264_vaapi encoder works up until there is a lot of visual change on the screen (e.g. dragging a big window or gaming). At that point the stream stutters and DATA_SHARDS_MAX warnings appear. It seems the encoder ignores bitrate constraints and generates packets too large for the error correction to handle.

As a workaround, I recompile sunshine (nightly or 0.20.0), having manually set the bitrate parameter in the options dictionary. The following works for me (tested on RX7600):

diff --git a/src/video.cpp b/src/video.cpp
index a0ab84c..efa0d0a 100644
--- a/src/video.cpp
+++ b/src/video.cpp
@@ -1565,10 +1565,18 @@ namespace video {
       }
       else {
         ctx->rc_min_rate = bitrate;
       }

+    //av_dict_set_int(&options, "b", 10*1000, 0); // WARNING crashes system irrecoverably
+    //av_dict_set_int(&options, "b", 100*1000, 0); // WARNING crashes system irrecoverably
+    //av_dict_set_int(&options, "b", 1000*1000, 0); // works very low quality
+    //av_dict_set_int(&options, "b", 10*1000*1000, 0); // 1440p@60 looks ok-ish
+    av_dict_set_int(&options, "b", 25*1000*1000, 0); // 1440p@60 looks ok-ish
+    //av_dict_set_int(&options, "b", 30*1000*1000, 0); // DATA_SHARDS_MAX issues
+    //av_dict_set_int(&options, "b", bitrate, 0);// let client decide whether to crash our system or stutter
+
       if (encoder.flags & RELAXED_COMPLIANCE) {
         ctx->strict_std_compliance = FF_COMPLIANCE_UNOFFICIAL;
       }

       if (!(encoder.flags & NO_RC_BUF_LIMIT)) {
ReenigneArcher commented 1 year ago

@CypherGrue if that works, I wonder if you'd have success with this PR? https://github.com/LizardByte/Sunshine/pull/1463

CypherGrue commented 1 year ago

@ReenigneArcher I wouldn't expect this encoder issue to be resolved by that UI change. That new configuration option can also be plumbed into my rough workaround, I guess. The more I look at this h264_vaapi stuff the messier it looks. After quarter hour of low intensity desktop use there are now colour defects (encoding errors) covering the top of my screen. Oh dear.

radugrecu97 commented 11 months ago

I also have this kind of overshoot on my 7900XTX with vbr_latency enabled

Don't use vbr, for real. It's downright detrimental to quality without lookahead, and simply impossible with single frame vbv(hrd) - or in other words, without bitrate peaks.

How could I enable the enforce_hrd ffmpeg flag using Sunshine?

Not without recompiling, need to add line here https://github.com/LizardByte/Sunshine/blob/master/src/video.cpp#L566

Could you point out where the line is again? I believe it shifted.

rudolfkastl commented 10 months ago

With the 7900xtx i had to set to ~50mbps with 4k 60fps to avoid the overshoot stutters with vbr_latency and also otherwise default AMF settings. Since the HAGS enablement in the newest driver version i can set it to 100mbps without having overshoots.

What other optimisation can be done in regards to the settings for a gigabit network (and also a gigabit connection with the TV)? Any suggestions?

tomasdeltell commented 9 months ago

I also have this kind of overshoot on my 7900XTX with vbr_latency enabled

Don't use vbr, for real. It's downright detrimental to quality without lookahead, and simply impossible with single frame vbv(hrd) - or in other words, without bitrate peaks.

How could I enable the enforce_hrd ffmpeg flag using Sunshine?

Not without recompiling, need to add line here https://github.com/LizardByte/Sunshine/blob/master/src/video.cpp#L566

Could you point out where the line is again? I believe it shifted.

I join the request. How can i patch the video.cpp for recomplie sunshine?

Bizangel commented 7 months ago

For the record, this issue is still present on latest sunshine stable v0.22.0

Reproduced in my RX 7900XT using CQP with default QP 28 with 4k60 80mbps. I was getting random upstream host spikes up to 250-300mbps which resulted in several stream crashes on client.

Also noticed that the issue is still somewhat present using vbr_latency. As "drastic changes" caused upstream spikes and lag/stutter on client. (ex: Flashbanged in a game, whole screen goes noisy white in a split-second, spike gets sent and clients stutters/lags/freezes, sometimes even crashing.)

I workarounded it by using CBR and recompiling sunshine with enforce_hrd flag. The issue was still present using CQP, even with the enforce_hrd flag. Now stream runs smoothly using CBR 80mbps at 4k60fps.

For those wondering this is the patch to video.cpp I used:

diff --git a/src/video.cpp b/src/video.cpp
index f786aeb..ed95cc2 100644
--- a/src/video.cpp
+++ b/src/video.cpp
@@ -836,6 +836,7 @@ namespace video {
         { "rc"s, &config::video.amd.amd_rc_hevc },
         { "usage"s, &config::video.amd.amd_usage_hevc },
         { "vbaq"s, &config::video.amd.amd_vbaq },
+        { "enforce_hrd"s, 1 },
       },
       {},  // SDR-specific options
       {},  // HDR-specific options

Just add the enforce_hrd line

DistantThunder commented 7 months ago

Just chiming in to say that I have this issue even with VAAPI:

[2024:03:23:12:58:36]: Info: CLIENT CONNECTED [2024:03:23:12:58:36]: Info: Found display [wayland-0] [2024:03:23:12:58:36]: Info: Found interface: zxdg_output_manager_v1(31) version 3 [2024:03:23:12:58:36]: Info: Found interface: wl_output(64) version 4 [2024:03:23:12:58:36]: Info: Resolution: 3840x2160 [2024:03:23:12:58:36]: Info: Offset: 0x0 [2024:03:23:12:58:36]: Info: Logical size: 3840x2160 [2024:03:23:12:58:36]: Info: Name: HDMI-A-1 [2024:03:23:12:58:36]: Info: Found monitor: BBC HDP-V104/2576980377 [2024:03:23:12:58:36]: Info: -------- Start of KMS monitor list -------- [2024:03:23:12:58:36]: Info: Monitor 0 is HDMI-A-1: BBC HDP-V104/2576980377 [2024:03:23:12:58:36]: Info: --------- End of KMS monitor list --------- [2024:03:23:12:58:36]: Info: Screencasting with KMS [2024:03:23:12:58:36]: Info: Found monitor for DRM screencasting [2024:03:23:12:58:36]: Info: Found connector ID [112] [2024:03:23:12:58:36]: Info: Found cursor plane [76] [2024:03:23:12:58:37]: Info: SDR color coding [Rec. 601] [2024:03:23:12:58:37]: Info: Color depth: 8-bit [2024:03:23:12:58:37]: Info: Color range: [MPEG] [2024:03:23:12:58:37]: Error: [hevc_vaapi @ 0x7e4d88233bc0] No usable encoding entrypoint found for profile VAProfileHEVCMain (17). [2024:03:23:12:58:37]: Info: Retrying with fallback configuration options for [hevc_vaapi] after error: Fonction non implantée [2024:03:23:12:58:37]: Warning: [hevc_vaapi @ 0x7e4d88a5fc80] Driver does not support some wanted packed headers (wanted 0xd, found 0x1). [2024:03:23:12:58:43]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 737 > 255, skipping error correction [2024:03:23:12:58:43]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 737 > 255, skipping error correction [2024:03:23:12:58:43]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 737 > 255, skipping error correction [2024:03:23:12:58:55]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 280 > 255, skipping error correction [2024:03:23:12:58:55]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 280 > 255, skipping error correction [2024:03:23:12:58:55]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 279 > 255, skipping error correction [2024:03:23:12:59:28]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 642 > 255, skipping error correction [2024:03:23:12:59:28]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 642 > 255, skipping error correction [2024:03:23:12:59:28]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 641 > 255, skipping error correction [2024:03:23:12:59:32]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 614 > 255, skipping error correction [2024:03:23:12:59:32]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 614 > 255, skipping error correction [2024:03:23:12:59:32]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 614 > 255, skipping error correction [2024:03:23:12:59:44]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 335 > 255, skipping error correction [2024:03:23:12:59:44]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 335 > 255, skipping error correction [2024:03:23:12:59:44]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 335 > 255, skipping error correction [2024:03:23:13:03:23]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 450 > 255, skipping error correction [2024:03:23:13:03:23]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 450 > 255, skipping error correction [2024:03:23:13:03:23]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 448 > 255, skipping error correction [2024:03:23:13:03:28]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 636 > 255, skipping error correction [2024:03:23:13:03:28]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 636 > 255, skipping error correction [2024:03:23:13:03:28]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 635 > 255, skipping error correction [2024:03:23:13:03:33]: Info: CLIENT DISCONNECTED

psyke83 commented 6 months ago

@DistantThunder @entropicdrifter @CypherGrue

This was mentioned already, but the AMF encoder is not used on Linux, so this issue doesn't apply to your systems.

I need to set up my Linux build environment to troubleshoot VAAPI encoding, but if you wish to test your own builds in the meantime, I suggest experimenting with the VAAPI tunables. The most obvious tunable to test would be rc_mode. For example, to test AVBR, add the following:

diff --git a/src/video.cpp b/src/video.cpp
index 920ce1a4..72157a5d 100644
--- a/src/video.cpp
+++ b/src/video.cpp
@@ -825,6 +825,7 @@ namespace video {
         { "async_depth"s, 1 },
         { "sei"s, 0 },
         { "idr_interval"s, std::numeric_limits<int>::max() },
+        { "rc_mode"s, "AVBR"s },
       },
       // SDR-specific options
       {},
@@ -844,6 +845,7 @@ namespace video {
         { "async_depth"s, 1 },
         { "sei"s, 0 },
         { "idr_interval"s, std::numeric_limits<int>::max() },
+        { "rc_mode"s, "AVBR"s },
       },
       // SDR-specific options
       {},

I will investigate this soon, but a new issue should be opened to properly track VAAPI encoding overshoot issues.

saffroy commented 3 months ago

@psyke83 FWIW on Linux with VAAPI on AMD GPU (RX 6700XT), I haven't had much success with adding an "rc_mode" option, but adding a "max_frame_size" does help. Not clear to me if that would be a fix or a workaround.

Of course, this means picking a reasonable value for "max_frame_size": I chose 3 times the average frame size for my bitrate/framerate (20Mbps/60FPS so about 100kB), though I have no idea how actual frame sizes vary, but it works for me.

gschintgen commented 3 months ago

@saffroy What is your exact issue? Is everything wired or do you use wifi? Do you have the DATA_SHARDS_MAX message repeatedly in your log?

Would you mind setting Sunshine's loglevel to debug and use the latest pre-release? It then logs each IDR keyframe that it emits. I've sometimes seen the DATA_SHARDS_MAX message too and I think it may be related to those keyframes. (Which should be inherently larger.)

Note that in normal operation (no network issues) Sunshine does not send regular keyframes. It does so only if requested by the client.

saffroy commented 3 months ago

@gschintgen I would often experience massive slowdowns followed by video freezing, and remaining frozen until I disconnect and reconnect Moonlight. The warnings about DATA_SHARDS_MAX were always present when video froze permanently. Audio was still streaming (looking at network bandwidth confirmed that video was no longer streaming).

My client laptop is on wifi (can't use ethernet here), streaming over the Internet. After patching v0.23.1 to set max_frame_size, I no longer saw the warnings or experienced video freezing. There were still frequent slowdowns though; I got rid of most of them with some tuning on the laptop (set "swcrypto=1" on iwlwifi).

After network tuning, I tested again without max_frame_size, and the slowdowns/warnings/freezes were less common, but still occurred sometimes.

Based on earlier comments, I thought the extra large frames were spurious and could be the cause for slowdowns; your comment suggests they might be actually induced by network lag, forcing the client to catch up. If so, I'd argue that occasional network lag is a fact of life, and when it happens, maybe avoiding oversized frames is for the best?

I'll see what I can do wrt. debug logs, but no promises.

ns6089 commented 3 months ago

Warnings about DATA_SHARDS_MAX indicate encoded video frame sizes going above hardcoded network frame size limit (yes, there's one). The fact that the video "freezes" may indicate that the stream is entering, let's call it, I-frame death loop - decoder requires I-frame to continue, but each I-frame overshoots the hardcoded limit. So yeah, "oversized frames" is the root cause of all of this.

gschintgen commented 3 months ago

I rather suppose the root cause is a transient network issue that triggers the I-frame death loop. (Too much packet loss, Moonlight asks for an IDR frame, the generated frame is too large, vicious circle ...).

gschintgen commented 3 months ago

The max_frame_size insight is valuable though. It should permit to tame those IDR frames. (I didn't look into it in more detail.)

CypherGrue commented 3 months ago

So yeah, "oversized frames" is the root cause of all of this.

I rather suppose the root cause is a transient network issue

Once there is an oversized frame due to an in-game flashbang, subsequent re-requests will encounter the same problem, giving the impression of a network death loop. But network code/events are not the cause. Network code (error correction specifically) merely has a maximum acceptable packet size limit. The encoder breaches that limit despite being configured not to do so.

To confirm and restate, the software encoder works just fine under identical network conditions, and there's no indication of underlying packet loss on my 2-device (+router) wired gigabit network.

The question is how to provide better configuration for the encoder, or figure out if there's a bug in there. Has anyone else tried passing "b" as in my patch above?

gschintgen commented 3 months ago

A sidenote regarding an earlier comment by @CypherGrue :

(...) h264_vaapi (...) After quarter hour of low intensity desktop use there are now colour defects (encoding errors) covering the top of my screen.

I don't know if that one is Sunshine related, but I can confirm this (unrelated?) issue: On my RX6650 any h264 stream will end up in a broken state after some time (10-20 minutes?). It looks like one broken frame is incoming and then all subsequent P-frames are based on this incorrect frame. After some time (and multiple screen content changes), the stream will partially recover but the colors will still be off. This happens with all Moonlight clients that I tried (Pi4, Intel iGPU under Linux, another Intel iGPU under Windows, Android phone, Steam Link). When I was still occasionally streaming to my h264-only Steam Link device I used software encoding to work around this issue...

CypherGrue commented 3 months ago

@gschintgen Your experience feels related to mine. All I had running was Gnome with a clock like "1 Jul 23:59:59", and just the clock text changing caused discoloration of the top bar after ~15 min. I guess I should try streaming with obs or something else next time to eliminate sunshine.

gschintgen commented 3 months ago

@gschintgen Your experience feels related to mine. All I had running was Gnome with a clock like "1 Jul 23:59:59", and just the clock text changing caused discoloration of the top bar after ~15 min. I guess I should try streaming with obs or something else next time to eliminate sunshine.

I suppose the issue may be less noticeable in other programs (if they are affected too that is) due to more frequent I-frames. But Sunshine completely suppresses them... At least it asks for no I-frames and AFAICT it's indeed only emitting them on explicit request by the client. (With loglevel "debug" the latest pre-releases will log all IDR frames.)

CypherGrue commented 3 months ago

Back on topic now. With AMD Radeon RX 7600, Ubuntu 24.04 Linux 6.9.3 Mesa 24.0.5 Sunshine master (90fd3712) h264_vaapi. Can still reproduce the issue.

The VAAPI tunables suggested by @psyke83 were helpful. My patch was a bit of a sledgehammer workaround to control max packet size implicitly by controlling the bitrate "b". However, the appropriate tunable for what we want, which is to limit the individual packet sizes independently of overall rate looks to be "bufsize".

A quick hack to output payload size gives:

[2024:07:01:02:16:43]: Warning: Number of fragments for reed solomon exceeds DATA_SHARDS_MAX 261 payload 305447 > 255, skipping error correction

With reverse engineering a reasonable number: 305447/261=1170 1170*255=298350 To be safe use a slightly lower value, lets say 2^18=262144

diff --git a/src/video.cpp b/src/video.cpp
index 10c91d1e..075eb183 100644
--- a/src/video.cpp
+++ b/src/video.cpp
@@ -828,6 +828,7 @@ namespace video {
         { "async_depth"s, 1 },
         { "sei"s, 0 },
         { "idr_interval"s, std::numeric_limits<int>::max() },
+        { "bufsize"s, 1<<18 },
       },
       // SDR-specific options
       {},
@@ -847,6 +848,7 @@ namespace video {
         { "async_depth"s, 1 },
         { "sei"s, 0 },
         { "idr_interval"s, std::numeric_limits<int>::max() },
+        { "bufsize"s, 1<<18 },
       },
       // SDR-specific options
       {},

Tested and working for me, could be the fix!

For prod, the exact number should be calculated from blocksize*DATA_SHARDS_MAX.

(stream.cpp): auto blocksize = session->config.packetsize + MAX_RTP_HEADER_SIZE;

"max_frame_size" doesn't do anything for me unfortunately

peacey commented 3 months ago

@CypherGrue I can confirm your patch has fixed the issue for me on latest git with AMD 7900XTX.

I was also getting similar symptoms with the data shards max error when streaming on WiFi and the stream kept stuttering on quicker scenes. Your patch is the first to fix the issue for me completely (only tested with VAAPI HEVC on Linux). I have been streaming for over an hour now without a single drop or slowdown, and shards error has disappeared from the logs completely. Thanks!

ns6089 commented 3 months ago

Actually, sunshine sets bufsize as part of normal operation.

Option description: {"bufsize", "set ratecontrol buffer size (in bits)", OFFSET(rc_buffer_size), AV_OPT_TYPE_INT, {.i64 = DEFAULT }, INT_MIN, INT_MAX, A|V|E},

Code snippet from sunshine's video.cpp:

        if (!(encoder.flags & NO_RC_BUF_LIMIT)) {
          if (!hardware && (ctx->slices > 1 || config.videoFormat == 1)) {
            // Use a larger rc_buffer_size for software encoding when slices are enabled,
            // because libx264 can severely degrade quality if the buffer is too small.
            // libx265 encounters this issue more frequently, so always scale the
            // buffer by 1.5x for software HEVC encoding.
            ctx->rc_buffer_size = bitrate / ((config.framerate * 10) / 15);
          }
          else {
            ctx->rc_buffer_size = bitrate / config.framerate;

#ifndef __APPLE__
            if (encoder.name == "nvenc" && config::video.nv_legacy.vbv_percentage_increase > 0) {
              ctx->rc_buffer_size += ctx->rc_buffer_size * config::video.nv_legacy.vbv_percentage_increase / 100;
            }
#endif
          }

The relevant line is ctx->rc_buffer_size = bitrate / config.framerate; It's possible that certain amd encoders don't operate well with single frame vbv, and you have to increase it otherwise the encoders in question will ignore the limit. Doesn't make much sense, but I've seen weirder stuff.

ns6089 commented 3 months ago

Ah, sunshine's vaapi encoder sets the internal NO_RC_BUF_LIMIT flag and the relevant code section gets bypassed. Why that flag is there I have absolutely no idea. So if we're talking about patches, removing the flag from here https://github.com/LizardByte/Sunshine/blob/90fd3712a836799d44e7cd1bf18007087305e3e8/src/video.cpp#L862 should work better.

CypherGrue commented 3 months ago

@ns6089 Can confirm that removing NO_RC_BUF_LIMIT also fixes the issue for me.

I don't see anywhere in the code where rc_buffer_size is compared/constrained by the 300KB payload network limit (blocksize*DATA_SHARDS_MAX). So I guess it can break again if streaming at a low frame rate and large enough bitrate?

ns6089 commented 3 months ago

Break is a strong word, but for such frames error correction will be disabled. I think 150mbps bitrate limit in moonlight settings was selected because of this. The obvious solution is to increase DATA_SHARDS_MAX because on the other end we're limited by MTU, but it's painful since it's a hardcoded value and needs to be changed on moonlight side as well.

CypherGrue commented 3 months ago

I would call it bend, but since the error correction did not transparently disable itself for us here, I assume it will stutter or block the stream in the same way.

Also would it give more headroom if we bump up the FEC blocks as follows? Sorry I'm a bit clueless :)

diff --git a/src/stream.cpp b/src/stream.cpp
index 83e4a128..5dfae434 100644
--- a/src/stream.cpp
+++ b/src/stream.cpp
@@ -1341,8 +1341,8 @@ namespace stream {
       // Therefore, we start breaking the data up into three separate fec blocks.
       auto multi_fec_threshold = 90 * blocksize;

-      // We can go up to 4 fec blocks, but 3 is plenty
-      constexpr auto MAX_FEC_BLOCKS = 3;
+      // We can go up to 4 fec blocks
+      constexpr auto MAX_FEC_BLOCKS = 4;

       std::array<std::string_view, MAX_FEC_BLOCKS> fec_blocks;
       decltype(fec_blocks)::iterator
@@ -1357,12 +1357,12 @@ namespace stream {
         auto unaligned_size = payload.size() / MAX_FEC_BLOCKS;
         auto aligned_size = ((unaligned_size + (blocksize - 1)) / blocksize) * blocksize;

-        // Break the data up into 3 blocks, each containing multiple complete video packets.
-        fec_blocks[0] = payload.substr(0, aligned_size);
-        fec_blocks[1] = payload.substr(aligned_size, aligned_size);
-        fec_blocks[2] = payload.substr(aligned_size * 2);
+        // Break the data up into blocks, each containing multiple complete video packets.
+        for (int x = 0; x < MAX_FEC_BLOCKS; ++x) {
+            fec_blocks[x] = payload.substr(aligned_size*x, aligned_size);
+        }

-        lastBlockIndex = 2 << 6;
+        lastBlockIndex = (MAX_FEC_BLOCKS - 1) << 6;
         fec_blocks_end = std::end(fec_blocks);
       }
       else {
ns6089 commented 3 months ago

Also would it give more headroom if we bump up the FEC blocks as follows?

It should, in theory. I'm not absolutely sure, this code has been there since dark ages. But 4 blocks seems to be the limit that moonlight can handle: https://github.com/moonlight-stream/moonlight-common-c/blob/8599b6042a4ba27749b0f94134dd614b4328a9bc/src/VideoDepacketizer.c#L757-L758

CypherGrue commented 3 months ago

It might be lovely to have something both safe and self-documenting like

rc_buffer_size = min(blocksize * DATA_SHARDS_MAX, bitrate / config.framerate)

In practice, your suggestion to remove NO_RC_BUF_LIMIT seems like the way to go for the 99% of the dozens of us on Linux/AMD/vaapi who are otherwise unable to play Doom.

CypherGrue commented 3 months ago

Why that flag is there I have absolutely no idea.

NO_RC_BUF_LIMIT was added here for Intel GPUs https://github.com/LizardByte/Sunshine/pull/1255 9e23b396

ReenigneArcher commented 3 months ago

Should this issue be re-opened?

ns6089 commented 3 months ago

We're discussing vaapi-specific problem now, while the issue is for amf. If anything this needs a new issue.