Closed doctorpangloss closed 2 years ago
Maybe no copying will help. The profiler measurements for WebRTC.EncodeFrame with NvEncoder is showing durations as high as 7ms, which is really surprising and suggests it is misconfigured.
7ms is within typical encoding duration - https://parsec.app/blog/nvidia-nvenc-outperforms-amd-vce-on-h-264-encoding-latency-in-parsec-co-op-sessions-713b9e1e048a
I don't think the copying, which takes like 0.12ms, has anything to do with it.
@doctorpangloss Thank you for measuring the latency and report them. I guess the latency may be increased since the PR #650. This pull request separates the loads of encoding from the rendering thread. And another PR #728 also affects the latency because it adds the wait time to wait frame timing for keeping framerate.
I guess our implementation of multi-threading for encoder has the problem which makes worse latency. I need to check more detail. One question, I would like to know how you measure the latency.
One question, I would like to know how you measure the latency.
I put a game object in the scene whose position is set to whatever the mouse's position is, and I move it around. It's very subjective.
I am experimenting with improvements.
UnityVideoTrackSource.OnFrameCaptured
- what does it do? Why is this so slow? I see encoding separately from this preamble. Blitting + "OnFrameCaptured" (which does not include the encoding step) is as long as 7ms on my 5950x + 3090.
I turned off framerate control.
Have you modify the native code to improve performance? As you know, UnityVideoTrackSource.OnFrameCaptured
pass frames to encoder asyncnously to keep the framerate of encoding. It may be possible to improve the latency if we fix the process runs syncnously.
Have you modify the native code to improve performance?
Only by reverting the framerate control patch.
It may be possible to improve the latency if we fix the process runs syncnously.
I think so. The current way you put the NvEnc encoding step on the encoder queue seems right to me. Enabling NvEnc async for Windows only would achieve something similar. What I am trying to figure out is how to prevent the waiting that occurs in OnFrameCaptured:
In this screenshot, observe OnFrameCaptured is 2.6ms on the render thread. Right at the end of that scope, Encoder Queue thread starts its NvEnc work. Sometimes OnFrameCaptured is as long as 7.6ms. All the other work it's doing is fast. It appears that the waiting in OnFrameCaptured is happening on the render thread.
Okay I see GpuMemoryBufferPool::CreateFrame
is very slow. I will try experimental/direct-renderer
.
Why not map the Unity RenderTexture to a CUDA resource directly? Why copy?
Can we retain the SRP cameraColorBuffer
somehow?
I see there need to be three fixes
ITexture2D*
. then, they should create the appropriate "texture view" for the hardware/software encoder on demand inside the encoder queue thread. this means handle()
does the map
step in the current implementation, and ToI420
gets the cpu texture.map
on windows, there must be DX11 and DX12 NVEncoderImpl support, instead of using CUDA encoder for all platforms.I've documented my conclusions here: https://github.com/Unity-Technologies/com.unity.webrtc/issues/803
Package version
2.4.0-exp.10
Environment
Steps To Reproduce
Current Behavior
The latency is like 60-150ms.
Expected Behavior
The latency should be close to 0ms. This was achievable with 2.4.0-exp5 (before the encoder refactor)
Anything else?
No response