Unity-Technologies / com.unity.webrtc

WebRTC package for Unity
Other
742 stars 186 forks source link

[REQUEST]: Improve input latency for streaming from Unity camera to end users #803

Open doctorpangloss opened 1 year ago

doctorpangloss commented 1 year ago

Is your feature request related to a problem?

This is an approach to improve the latency with the current architecture of the plugin.

The goal is to get the frame encoded within 7ms of finishing rendering (https://parsec.app/blog/nvidia-nvenc-outperforms-amd-vce-on-h-264-encoding-latency-in-parsec-co-op-sessions-713b9e1e048a), which is pretty close to NvFBC / Moonlight.

Right now, due to the architecture of the plugin, the time between rendering finishing and encoding the frame is about

Describe the solution you'd like

Describe alternatives you've considered

No response

Additional context

No response

doctorpangloss commented 1 year ago

some other architectural ideas:

karasusan commented 1 year ago

@doctorpangloss Thanks for your suggestions, we added tasks to investigate the improvements you suggested.

doctorpangloss commented 1 year ago

Based on some further research:

  1. in URP and HDRP, use the same event / pass that the Unity Recorder does to call a new render event "TextureReady" with a GPU fence
  2. the video frame scheduler awaits this fence before encoding the frame.

in principle the video frame scheduler is already 90% of the way there for "framerate synchronization mode." the hard part for me is how to get the DX12Fence out of the command buffer's fence structure on the Unity side.

karasusan commented 1 year ago

@doctorpangloss Sorry but I couldn't find the event "TextureReady" in Graphics repository. Where did you find it?

Joao-RI commented 1 year ago

Hi

Is there a timeline for fixes for these issues? I'm using the UnityRenderStreaming package and from my own tests, in an extremely fast local network, streaming video from a PC to a Quest 2 at 1920x1080, 60fps, there is ~120ms latency.

When compared to other applications that do streaming from PC to Quest 2 (e.g. Virtual Desktop), under the same circumstances, it's an increase of ~80ms.

I've tried all video codecs and the results were very similar.

Thanks.

karasusan commented 1 year ago

@Joao-RI We already have the issue here. #838 The video codec might be the main reason for the latency.

Joao-RI commented 1 year ago

@karasusan Thanks for the reply.

We checked again with the UnityRenderStreaming 3.1.0-exp.6, between PC - Quest 2 and PC - PC. And the results between them are similar. Around 90ms latency.

There seems to have been an improvement between versions, but the results between platforms using H264 were identical. This tells me that either the decoder is also not using hardware acceleration on PC or there something else that is eating up some time, like suggested by @doctorpangloss.

VP9 and AV1 performed slightly better on Quest 2 with static screens (we didn't test these on PC), but the framerate felt unstable once there was a lot going on.

kannan-xiao4 commented 1 year ago

@Joao-RI As you say, Quest2 should be able to use HWAcceleration or improve latency by suggested by @doctorpangloss . But We cannot say when it will be resolved.

doctorpangloss commented 1 year ago

I'm going to take another look at this issue. For my purposes, focusing on just DX11 and DX12 support is ideal. Some background:

What does the future hold? I don't know. The Mac Studio is the highest performance converged commodity platform today. It's a better architecture for hosted streaming. It will take years for converged NVIDIA, AMD & Intel high performance APUs to reach the datacenter, in a way that is compatible with graphics and/or Windows. Too much emphasis on ML. Really depends what you are excited about.

karasusan commented 1 year ago

@doctorpangloss Thanks for sharing your opinion.

The Unity editor on Linux can't build any projects I've tried correctly.

I am curious this line, what do you mean?

doctorpangloss commented 1 year ago

@doctorpangloss Thanks for sharing your opinion.

The Unity editor on Linux can't build any projects I've tried correctly.

I am curious this line, what do you mean?

We have experimented with build players running in Linux and Windows containers. A Windows or Linux standalone player built on the Linux headless editor always has flaws in any large / complex projects we tried. For example, the pivot point of an animated FBX character would be in the center of the animated person when built by a Linux headless editor, versus the correct position on Windows headless editor. Or various textures would be blank on (Linux headless editor, Windows standalone target), correct on (Windows headless editor, Windows standalone target).

doctorpangloss commented 1 year ago

It might be sufficient to create a FenceScheduler instead of a VideoFrameScheduler, and pass a fence and value from Unity in a custom pass / render feature when the frame has been blitted to the render texture (i.e. rendering has finished).

doctorpangloss commented 1 year ago

Simply bypassing the video frame scheduler eliminates a significant amount of latency. What I do not comprehend is why this does not work in Standalone players. Do you have any insight as to what is different between the editor and standalone when it comes to render events?

karasusan commented 1 year ago

What I do not comprehend is why this does not work in Standalone players.

I am understanding that we need to skip the process of video frame scheduler which keeps streaming framerate to improve latency. Didn't it resolve the performance issue on standalone player?

doctorpangloss commented 1 year ago

Didn't it resolve the performance issue on standalone player?

It does, I discovered my mistake. In my environment, the editor doesn't use a TURN relay, but standalone does :) Once I realized my error, I can see that significant latency improvements occurred.

Additionally, in SRP pipelines, you should use RenderPipelineManager.endCameraRendering and the SRP context to schedule the blit and plugin event, because WaitForEndOfFrame() on the editor waits for the editor player loop & target framerate so latency appears to be much higher but not because of plugin architecture. This is why when I tried to remove the video frame scheduler earlier this year, I saw no improvement.

On builtin, use this in a late update:

private CommandBuffer buffer = new();
...
// Encode is the same as WebRTC.Encode, writes the blits and plugin event but does not execute
((VideoTrackSource)m_source).Encode(buffer);
camera.RemoveCommandBuffer(CameraEvent.AfterEverything, buffer);
camera.AddCommandBuffer(CameraEvent.AfterEverything, buffer);

video frame scheduler which keeps streaming framerate

Users (or the plugin) should configure Application.targetFrameRate.

Also, I have not had luck munging the SDP for setting max framerate. Always seems to be 60. I would like 144.

The webrtc frame adapter already drops frames that are coming in too frequently. This seems acceptable to me.

Ideally you blit directly into the input resource for NvENC. We discussed this elsewhere.

I can't figure out what to do with the private pointer in the GraphicsFence struct from Unity. It is a null pointer in the render thread. Maybe you can look at the Unity source and figure it out.

Copying synchronously takes as long as 6ms sometimes. That will probably be closer to 0.02ms if you did the blitting from Unity and used a fence. I started working on this but it exceeds my C++ abilities.

karasusan commented 1 year ago

Framerate syncronization mode (WRS-354) https://github.com/Unity-Technologies/com.unity.webrtc/pull/950

doctorpangloss commented 1 year ago

In the following conditions:

I observe 2-7ms of NvEnc work and about 1ms of overhead. 0.25ms on the render thread for copying, 0.75ms in CopyResourceNativeV. So it's pretty great!

ViGeng commented 3 months ago

Hello, @karasusan

I am running the sample E2ELatency locally (0ms RTT) and found the shown averaged latency fell into 25~60ms. (BTW, I also raised an issue about the calculation method of latency in https://github.com/Unity-Technologies/com.unity.webrtc/issues/1025)

It is out of my expectation since WebRTC is designed for "Real time". So I did a simple test in this DataChannel sample. I logged timestamps before sending and on receiving message. It shows about 0~1ms latency. This seems reasonable. I can not understand why video and message have such a big gap (latency difference).

I wonder whether these values are typical from your side? If it is, could anyone tell me any potential solutions for a time-sensitive interaction applications?

Thanks in advance!