Unity-Technologies / com.unity.webrtc

WebRTC package for Unity
Other
738 stars 185 forks source link

fix: Vulkan CPU texture read performance, improve Unity pipeline state integration #967

Closed aet closed 9 months ago

aet commented 10 months ago

With the use of IUnityGraphicsVulkan::CommandRecordingState out of the way, here's another part I've been using in my branch for few years now. The plugin architecture has changed few times over the years, so I never had time to try pitching it in. 😄

aet commented 10 months ago

@karasusan this might indirectly help with the android thing

karasusan commented 10 months ago

@aet I haven't tried VK_MEMORY_PROPERTY_HOST_CACHED_BIT bit. Does it improve the performance? I'm profiling it.

karasusan commented 10 months ago

@aet I am currently investigating the performance of the package before checking this PR.

We have found that there is a performance issue with Linux Standalone in revision 84718a6. Compared to the latest release of 3.0.0-pre.6, performance is degraded under certain conditions. I will investigate which change is causing this.

This problem does not occur when vSync Count is set to Don't Sync, but occurs when Every V Blank is specified. This problem does not occur in Unity Editor, but only in Linux Standalone. The same problem has been confirmed for Android.

Please let us know if you are experiencing the same problem in your environment.

aet commented 10 months ago

@karasusan it's been a while since I've used Vulkan with software encoder path, but back then I was testing against software vp8/vp9 and h264 encoders and was wondering why I couldn't reach 60fps 1080p easily even with h264 on a Linux machine with plenty of cores.

It turned out reading the texture with the Vulkan implementation was really slow at VulkanGraphicsDevice::ConvertRGBToI420 and adding VK_MEMORY_PROPERTY_HOST_CACHED_BIT when it's available helped. But it's hard to make apples to apples comparison with the main branch, I've been using a custom frame scheduling and queuing for so long now.. they were tuned for performance at the time with sacrificing latency or memory use a bit. Little by little trying to get changes contributed forward that make sense.

karasusan commented 9 months ago

@aet OK, if I can be of any help to test, please let me know.

aet commented 9 months ago

@karasusan hands-on quick test in android could be useful :) There's a lot of generic things outside of Vulkan that could be altered to avoid dropping frames, but it doesn't hurt to have shorter CPU read times. I'm trying to refresh my old debug notes from 2+ years ago, but it does seem like the CPU read for me took 22ms at 1080p and 10ms for smaller resolutions.. after fix, less than 1ms each. So it was hard to reach 60fps at all.

This was around M85 and there was that custom RGB to YUV conversion code that was slowing down the process a bit, but got converted to use libyuv directly later on. But I think I was measuring the texture read and rgb conversion separately.

karasusan commented 9 months ago

@aet Right, first I will measure the effect of the VK_MEMORY_PROPERTY_HOST_CACHED_BIT flag in my environment.

aet commented 9 months ago

@karasusan I got over my laziness to measure my own code after a long break 😄

On LinuxStandalone, for testing sake I'll disable h264 nvenc entirely and fallback to h264 software encoder that I've enabled for my libwebrtc build.

With m_hasHostCachedMemory = false, the average time it takes to read 1080p texture with vkMapMemory/vkUnmapMemory/webrtc::I420Buffer::Create and libyuv::ARGBToI420 calls is around 40ms.

With m_hasHostCachedMemory = true, it's 0.75ms on the average for me.

Tested with Intel i7-9750H / Geforce RTX 2060.

karasusan commented 9 months ago

@aet I checked this fix on Android device, and it can determine this makes dramatically performance improvement. In special, the result of software encoding like VP8 is noticeable. I would like to incorporate it to the next release.

At the same time, however, I found the video image disturbance when setting vSync enable on Android. This issue might be occurred since #966. I am keeping the investigation to fix the issue.

aet commented 9 months ago

@karasusan I wasn't sure what was the main motivation getting CommandRecordingState into use in PR 966 after PR 888 closed? Not that I don't want it to happen, just wasn't sure if there was more reasons. :)

I am able to reproduce the same WaitSync regression from the original PR 888 when vsync is enabled. There's not much hooks to work with from the plugin towards Unity rendering state, so synchronizing against frame counts is always kind of hacky compared to custom CommandBuffer and VkFence. On Linux, running 60fps app with vsync gives me about 20fps stream and WaitSync was either done at 18ms or timing out at 30ms.

Since Unity Vulkan graphics device updates the safe frame counter once a frame, after presenting, we'll need to add some C# side PlayerLoop extras that get called after Unity's built-in PostLateUpdate.PresentAfterDraw. Something that triggers UpdateState on plugin side basically.

I have a small prototype of this, that fixes the framerate back to 60fps with vsync and LinuxStandalone. It's an add-on to PR 966 that doesn't conflict with this PR. I'll try to get it out for testing tomorrow, it's likely also improving the reason why WaitSync was timing out and causing crash on Android in PR 966 that had to be worked around.

I'm not sure what you mean by image disturbance, but if it's not related to this PR, I'm ok moving forward to merge. I'll add the vsync changesets to another PR, unless you prefer adding them here

aet commented 9 months ago

@karasusan please do me a favor and instead of reverting all of #966, let's just restore the part that really matters, texture specific cmdbuf/fence synchronization. I added a new changeset into this PR to allow using either cmdbuf/fence or CRS for synchonization, so it's easier to debug and compare this going forward once there's more time to look into the vsync issue.

BrianHarrisonUnity commented 9 months ago

Thanks for the quick updates!

I'll give it a test when I get a chance, but I think this looks good.

karasusan commented 9 months ago

@aet I tested it on Android and Linux. It looks good to me. Thanks you for contribution.