Open hgaiser opened 7 months ago
Does that work properly in the case where the host display's refresh rate is higher than the stream frame rate? IIUC, in that case, we could potentially capture and encode more frames per second than the client is expecting because there will often be a frame ready when we expect to be sleeping until the next frame is due.
We are asking NvFBC to sample at the requested framerate, so that isn't an issue:
We are asking NvFBC to sample at the requested framerate, so that isn't an issue:
I think that's a red herring. I had a look at this for #2333 and also thought that was the crucial line but it's rather this one: https://github.com/LizardByte/Sunshine/blob/7fb8c76590f843f28b2061cd0a1543f0710795e3/src/platform/linux/cuda.cpp#L814 I.e. we manually capture at the configured rate. The other line is probably just to put the capture feature in a reasonable state.
Did you test that with the current NOWAIT
flag? Because in that case the call to NvFBC indeed does not wait. The value is presumably still important for the internal loop of NvFBC, but not for the FPS that Sunshine maintains.
Which raises an interesting issue. If NvFBC updates internally every 16msec and Sunshine always waits exactly 16.666msec between frames, then every 24 (16 / 0.666
) frames it theoretically misses a frame. If that is true, it would be better to only let NvFBC do the sleeping (even though it can only do 16msec and not 16.666msec, so you'd get 62.5Hz).
Did you test that with the current
NOWAIT
flag? Because in that case the call to NvFBC indeed does not wait. The value is presumably still important for the internal loop of NvFBC, but not for the FPS that Sunshine maintains.Which raises an interesting issue. If NvFBC updates internally every 16msec and Sunshine always waits exactly 16.666msec between frames, then every 24 (
16 / 0.666
) frames it theoretically misses a frame. If that is true, it would be better to only let NvFBC do the sleeping (even though it can only do 16msec and not 16.666msec, so you'd get 62.5Hz).
You’re right I did miss the last (but crucial) part of your initial post, i.e. the part where you point out the “manual” waiting done by Sunshine. Anyway, I only had superficial contact with this detail of the code base since I don’t have nvidia hardware and it came up in the mentioned PR discussion.
I guess it all depends on what exactly NvFBC does with its 16.000 ms sampling rate:
a) It captures at 62.5Hz (i.e. the theoreticalnext_frame = previous + delay
like currently implemented in Sunshine’s logic), which would probably lead to pacing issues due to the rate mismatch. But that sounds more like the NOWAIT approach.
b) It captures a frame, then waits for 16.000ms and then blocks to wait for the next frame which will hopefully come at the 16.667 timepoint (due to a framelimiter). With perfect render timing that should theoretically lead to the best results (precise 60fps, no latency). But in that case I do think that 16ms is too long (rendering will still be subject to jitter) and frames will be missed.
Anyway I just wanted to give a heads up, but missed that you correctly saw sunshine’s own waiting.
I guess it all depends on what exactly NvFBC does with its 16.000 ms sampling rate: a) It captures at 62.5Hz (i.e. the theoretical
next_frame = previous + delay
like currently implemented in Sunshine’s logic), which would probably lead to pacing issues due to the rate mismatch. But that sounds more like the NOWAIT approach. b) It captures a frame, then waits for 16.000ms and then blocks to wait for the next frame which will hopefully come at the 16.667 timepoint (due to a framelimiter). With perfect render timing that should theoretically lead to the best results (precise 60fps, no latency). But in that case I do think that 16ms is too long (rendering will still be subject to jitter) and frames will be missed.
It seems like your option a) is correct. I tested this in moonshine (similar to sunshine, but only has NvFBC + NVENC), where I rely on NvFBC to block until a new frame arrives. According to moonlight I get roughly 62.5Hz if I stream dynamic content (in this case I just moved the cursor a lot) :
I guess sunshine waits to achieve the accurate 60Hz framerate, but by doing so, it skips a frame every 24 frames.
Anyway I just wanted to give a heads up, but missed that you correctly saw sunshine’s own waiting.
No worries! Appreciate it :).
Out of curiosity (since I'm not familiar with the inner workings of hardware pointers and such): Did you try what happens to the framerate as reported by Moonlight if you stream, let's say, an animated title screen of a game, framecapped at 60.00Hz, but without any mouse input?
/*!
* Default, capturing waits for a new frame or mouse move.
*
* The default behavior of blocking grabs is to wait for a new frame until
* after the call was made. But it's possible that there is a frame already
* ready that the client hasn't seen.
* \see NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY
*/
NVFBC_TOCUDA_GRAB_FLAGS_NOFLAGS = 0,
If taken literally the framerate should then drop to 60fps. Maybe it's "just" the mouse input (e.g. 1000Hz gaming mouse) that messes with the capture timing. Is it possible to hide mouse input from NvFBC?
I wasn't entirely sure what would happen either, but it seems to still stream at 62.5Hz if I don't move the mouse in a game (even if the game I was streaming shows a framerate of a steady 60Hz according to Steam FPS overlay).
Ok, so it seems indeed like NvFBC calls its blocking image capture at a precise 16.00 ms and most of the time it will find a frame that it has not yet shown, so it immediately returns that one. Until another 24 or 25 second period is over and the difference accumulates to a whole frame time interval.
So the ideal capture routine for frame-capped (e.g. 60fps) content (which is not really under our control...) would be:
Except that it will break down if the game is running uncapped and step 2 will capture too early and too often.
Isn't this then similar to the problem that the Windows side of Sunshine has to deal with when it is using the Desktop Duplication API (which I don't know anything about except the few tidbits I picked up here and there)? I saw some pacing code in there... https://github.com/LizardByte/Sunshine/blob/9d5ee2f57d2c582b5b0f0c7eb67a8b86daffd9a9/src/platform/windows/display_base.cpp#L170-L200 (I did not try to analyze or understand this code in detail.)
Why would the second frame come after 16.7msec? I would expect the internal functioning of NvFBC to always retrieve frames after 16msec intervals, so if you wait for 80%, say exactly 13msec, that the next blocking call would return after another exactly 3msec.
Right, there's also the question what "blocking" actually means: a) Blocking until the timer is up. b) Blocking until a new frame is available. I suppose it means b). Isn't that the point of blocking vs NOWAIT?
If the game is running at 60fps / 16.667ms, then the blocking call executed 13ms after the previous capture would then wait until a new frame has been rendered and is ready to be captured. And those frames should be emitted at an interval of 16.666ms due to the framecap.
I think it all depends on when the timer will trigger next:
It seems like NvFBC is doing the former, while the latter would be what's needed.
Unfortunately I can't test any of this. (I'm on Intel & AMD)
My suspicion is that NvFBC is running an internal loop, separate of the frame generation loop, which polls the latest frame after every dwSamplingRateMs
milliseconds. I ran the following code with nvfbc-rs
:
capturer.start(BufferFormat::Rgb, 60)?;
// In case it needs to warm up or initialize something.
for _ in 0..10 {
capturer.next_frame(CaptureMethod::Blocking)?;
}
let now = std::time::Instant::now();
for _ in 0..100 {
std::thread::sleep(std::time::Duration::from_millis(13));
capturer.next_frame(CaptureMethod::Blocking)?;
}
println!("{}", now.elapsed().as_micros() / 100);
Meaning we set the framerate to 60Hz (which is used to set the sampling rate to 16 msec), sleep 13msec and then wait for a next frame in a blocking manner. Repeat this 100 times and get the average time waited on new frames. I am getting this timing:
16033
Without the 13msec sleep I also get 16016
, roughly the same amount.
Hm, but somehow it must be possible to make NvFBC actually wait for a new frame:
Default, capturing waits for a new frame or mouse move.
* When using blocking calls each captured frame will have
* this flag set to NVFBC_TRUE since the blocking mechanism waits for
* the display server to render a new frame.
NVFBC_BOOL bIsNewFrame;
* The default behavior of blocking grabs is to wait for a new frame until
* after the call was made. But it's possible that there is a frame already
* ready that the client hasn't seen.
* \see NVFBC_TOSYS_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY
At least with NVFBC_TOSYS_GRAB_FLAGS_NOFLAGS
the call should always be blocking, no? That should then drag out the interval to the 16.666ms (supposing a framecap on the game as always).
With NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY
we could have the same behavior than with the simple NOWAIT? I.e. every time the 16.00ms timer fires it detects that a new frame is already there, it emits it, and 16.00 ms later it again detects that a new frame is immediately ready, etc. until 24-25 seconds are up.
Hm, but somehow it must be possible to make NvFBC actually wait for a new frame:
Default, capturing waits for a new frame or mouse move.
* When using blocking calls each captured frame will have * this flag set to NVFBC_TRUE since the blocking mechanism waits for * the display server to render a new frame. NVFBC_BOOL bIsNewFrame;
* The default behavior of blocking grabs is to wait for a new frame until * after the call was made. But it's possible that there is a frame already * ready that the client hasn't seen. * \see NVFBC_TOSYS_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY
At least with
NVFBC_TOSYS_GRAB_FLAGS_NOFLAGS
the call should always be blocking, no? That should then drag out the interval to the 16.666ms (supposing a framecap on the game as always).With
NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY
we could have the same behavior than with the simple NOWAIT? I.e. every time the 16.00ms timer fires it detects that a new frame is already there, it emits it, and 16.00 ms later it again detects that a new frame is immediately ready, etc. until 24-25 seconds are up.
Ah sorry, I had renamed NOFLAGS
to Blocking
in https://github.com/hgaiser/nvfbc-rs/blob/main/nvfbc/src/cuda.rs#L41 . So I was using NOFLAGS
, which still waits approximately 16msec, not 16.666msec.
That's weird. (And I'm out of ideas.)
That's weird. (And I'm out of ideas.)
That's okay, we tried ;)
A mystery for another day :magic_wand:
Not sure if it helps, but this might be the reason why I experience a weird stutter behavior with NvFBC, but it's completely smooth with KMS screen capture.
When testing with VRRTest set to 60 FPS, I can see that the moving bars hitch back and forth momentarily, as if the stream is showing an outdated frame but then immediately returns to the normal up-to-date frame. I have driver version 555.58.02 with explicit sync enabled (kwin 6.1.3)
When I enforce KMS capture without changing any other settings the stream is buttery smooth.
Not sure if it helps, but this might be the reason why I experience a weird stutter behavior with NvFBC, but it's completely smooth with KMS screen capture.
When testing with VRRTest set to 60 FPS, I can see that the moving bars hitch back and forth momentarily, as if the stream is showing an outdated frame but then immediately returns to the normal up-to-date frame. I have driver version 555.58.02 with explicit sync enabled (kwin 6.1.3)
When I enforce KMS capture without changing any other settings the stream is buttery smooth.
Can you test sunshine with the flag changed? Curious if it helps in your case.
I recompiled Sunshine with the changes outlined below. Unfortunately, the hitching/stuttering was still present, albeit subjectively less intense. It certainly didn't make things worse.
diff --git a/src/platform/linux/cuda.cpp b/src/platform/linux/cuda.cpp
index b5374b18..9d00a4a8 100644
--- a/src/platform/linux/cuda.cpp
+++ b/src/platform/linux/cuda.cpp
@@ -877,7 +877,7 @@ namespace cuda {
NVFBC_TOCUDA_GRAB_FRAME_PARAMS grab {
NVFBC_TOCUDA_GRAB_FRAME_PARAMS_VER,
- NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT,
+ NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY,
&device_ptr,
&info,
0,
@@ -932,7 +932,7 @@ namespace cuda {
NVFBC_TOCUDA_GRAB_FRAME_PARAMS grab {
NVFBC_TOCUDA_GRAB_FRAME_PARAMS_VER,
- NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT,
+ NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY,
&device_ptr,
&info,
(std::uint32_t) timeout.count(),
I recompiled Sunshine with the changes outlined below. Unfortunately, the hitching/stuttering was still present, albeit subjectively less intense. It certainly didn't make things worse.
Ah too bad, then I guess your stuttering is not related to this setting (at least not uniquely).
I don't mean this as advertisement.. but you're welcome to try moonshine, which works with the same components (NvFBC + NVENC) and implements the same protocol, so should be "plug-and-play". Would be interesting to see if you have stuttering there too. If that is the case, it rules out a lot of possible causes.
I did a bit more testing and I believe I isolated a frame timing issue with NvFBC, I created a diagram: Observed behavior: Every few hundred milliseconds, a frame is displayed twice. A couple hundred milliseconds later, a frame is skipped. Imagine labelling all even frames A and all odd frames B. Now, we display them at 60 Hz on the host machine. On the host-connected monitor we see the A/B frames flickering back and forth. If we now stream at 30 Hz, we would expect the client machine to either only see A-frames or only B-frames. However, due to some unknown timing issues when NvFBC is in use, the video stream alternates between showing only A-frames and only B-frames about 2-3 times per second. This behavior is also observeable at 60 Hz, although a bit more difficult to notice.
On the host machine, the bars are moving smoothly. On the client machine, they are hitching back and forth as frames are skipped and repeated.
Desired result: The bars should be moving as smoothly as on the host.
On the host machine, the bottom UFO is flickering as black frames are inserted for every second frame. On the client machine, the bottom UFO is appearing and disappearing as the stream is switching between even and odd frames constantly.
Desired result: The bottom UFO should be either always visible or always hidden.
The problem has been confirmed on all of the following configurations:
Host: Wayland, Client: Wayland Host: X11, Client: Wayland Host: X11 (no compositing), Client: Wayland Host: X11, Client: X11 Host: X11 (no compositing), Client: X11 (no compositing)
The problem appears regardless of the following NVIDIA settings (on the host): Sync to VBlank: on/off Allow Flipping: on/off Allow G-SYNC/G-SYNC Compatible: on/off Force Composition Pipeline: on/off Force Full Composition Pipeline: on/off
The NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY
patch above does also not affect the bug in any way. I have tried both and observed the same result.
The problem is absent on Wayland using KMS capture.
The problem seems to be absent on X11 using X11 capture, however a more general frame pacing issue (possibly due to X11 itself) is making it hard to judge.
Operating System: Arch Linux KDE Plasma Version: 6.1.4 KDE Frameworks Version: 6.5.0 Qt Version: 6.7.2 Kernel Version: 6.10.7-arch1-1 (64-bit) Graphics Platform: X11 Processors: 24 × AMD Ryzen 9 3900X 12-Core Processor Memory: 31.3 GiB of RAM Graphics Processor: NVIDIA GeForce RTX 2080 Ti/PCIe/SSE2 NVIDIA Driver Version: 560.35.03
That's some deep analysis :) I can't really say much about what might cause the issue you're seeing, but it appears unrelated to this issue since changing the flag doesn't seem to make a difference. It might make more sense to open a separate issue?
Did you happen to try moonshine? Since it's using a completely different implementation, I'm curious if you see similar issues there.
It seems this issue hasn't had any activity in the past 90 days. If it's still something you'd like addressed, please let us know by leaving a comment. Otherwise, to help keep our backlog tidy, we'll be closing this issue in 10 days. Thanks!
I think this is still relevant, so posting this so that the kind bot doesn't close it.
Is there an existing issue for this?
Is your issue described in the documentation?
Is your issue present in the nightly release?
Describe the Bug
NvFBC has a few different methods to capture an image. The one used in Sunshine is:
Basically it means that whenever Sunshine requests a new frame, a frame is provided, but that frame can be "old" (max of 1/fps seconds old). Still though, it means at 60fps a frame could be 16msec old.
Changing to
NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY
would mean that the frame request blocks until a new frame becomes available, but it returns the frame immediately if NvFBC knows it's a new frame. This also means that when the host is serving static content, the FPS drops to 13.33FPS in my tests (not sure why this amount exactly).And for context (since that flag basically extends the NOFLAGS flag) :
As far as I can see, we can simply use
NVFBC_TOCUDA_GRAB_FLAGS_NOWAIT_IF_NEW_FRAME_READY
instead ofNVFBC_TOSYS_GRAB_FLAGS_NOWAIT
. This wait becomes kinda redundant, but it won't hurt either.Expected Behavior
N/A
Additional Context
N/A
Host Operating System
Linux
Operating System Version
Arch Linux
Architecture
64 bit
Sunshine commit or version
7fb8c76590f843f28b2061cd0a1543f0710795e3
Package
other (self built)
GPU Type
Nvidia
GPU Model
GeForce RTX 3090
GPU Driver/Mesa Version
550.76
Capture Method (Linux Only)
NvFBC
Config
N/A
Apps
N/A
Relevant log output
N/A