Closed aet closed 9 months ago
@karasusan this might indirectly help with the android thing
@aet I haven't tried VK_MEMORY_PROPERTY_HOST_CACHED_BIT bit. Does it improve the performance? I'm profiling it.
@aet I am currently investigating the performance of the package before checking this PR.
We have found that there is a performance issue with Linux Standalone in revision 84718a6. Compared to the latest release of 3.0.0-pre.6, performance is degraded under certain conditions. I will investigate which change is causing this.
This problem does not occur when vSync Count is set to Don't Sync, but occurs when Every V Blank is specified. This problem does not occur in Unity Editor, but only in Linux Standalone. The same problem has been confirmed for Android.
Please let us know if you are experiencing the same problem in your environment.
@karasusan it's been a while since I've used Vulkan with software encoder path, but back then I was testing against software vp8/vp9 and h264 encoders and was wondering why I couldn't reach 60fps 1080p easily even with h264 on a Linux machine with plenty of cores.
It turned out reading the texture with the Vulkan implementation was really slow at VulkanGraphicsDevice::ConvertRGBToI420 and adding VK_MEMORY_PROPERTY_HOST_CACHED_BIT when it's available helped. But it's hard to make apples to apples comparison with the main branch, I've been using a custom frame scheduling and queuing for so long now.. they were tuned for performance at the time with sacrificing latency or memory use a bit. Little by little trying to get changes contributed forward that make sense.
@aet OK, if I can be of any help to test, please let me know.
@karasusan hands-on quick test in android could be useful :) There's a lot of generic things outside of Vulkan that could be altered to avoid dropping frames, but it doesn't hurt to have shorter CPU read times. I'm trying to refresh my old debug notes from 2+ years ago, but it does seem like the CPU read for me took 22ms at 1080p and 10ms for smaller resolutions.. after fix, less than 1ms each. So it was hard to reach 60fps at all.
This was around M85 and there was that custom RGB to YUV conversion code that was slowing down the process a bit, but got converted to use libyuv directly later on. But I think I was measuring the texture read and rgb conversion separately.
@aet Right, first I will measure the effect of the VK_MEMORY_PROPERTY_HOST_CACHED_BIT flag in my environment.
@karasusan I got over my laziness to measure my own code after a long break 😄
On LinuxStandalone, for testing sake I'll disable h264 nvenc entirely and fallback to h264 software encoder that I've enabled for my libwebrtc build.
With m_hasHostCachedMemory = false, the average time it takes to read 1080p texture with vkMapMemory/vkUnmapMemory/webrtc::I420Buffer::Create and libyuv::ARGBToI420 calls is around 40ms.
With m_hasHostCachedMemory = true, it's 0.75ms on the average for me.
Tested with Intel i7-9750H / Geforce RTX 2060.
@aet I checked this fix on Android device, and it can determine this makes dramatically performance improvement. In special, the result of software encoding like VP8 is noticeable. I would like to incorporate it to the next release.
At the same time, however, I found the video image disturbance when setting vSync enable on Android. This issue might be occurred since #966. I am keeping the investigation to fix the issue.
@karasusan I wasn't sure what was the main motivation getting CommandRecordingState into use in PR 966 after PR 888 closed? Not that I don't want it to happen, just wasn't sure if there was more reasons. :)
I am able to reproduce the same WaitSync regression from the original PR 888 when vsync is enabled. There's not much hooks to work with from the plugin towards Unity rendering state, so synchronizing against frame counts is always kind of hacky compared to custom CommandBuffer and VkFence. On Linux, running 60fps app with vsync gives me about 20fps stream and WaitSync was either done at 18ms or timing out at 30ms.
Since Unity Vulkan graphics device updates the safe frame counter once a frame, after presenting, we'll need to add some C# side PlayerLoop extras that get called after Unity's built-in PostLateUpdate.PresentAfterDraw. Something that triggers UpdateState on plugin side basically.
I have a small prototype of this, that fixes the framerate back to 60fps with vsync and LinuxStandalone. It's an add-on to PR 966 that doesn't conflict with this PR. I'll try to get it out for testing tomorrow, it's likely also improving the reason why WaitSync was timing out and causing crash on Android in PR 966 that had to be worked around.
I'm not sure what you mean by image disturbance, but if it's not related to this PR, I'm ok moving forward to merge. I'll add the vsync changesets to another PR, unless you prefer adding them here
@karasusan please do me a favor and instead of reverting all of #966, let's just restore the part that really matters, texture specific cmdbuf/fence synchronization. I added a new changeset into this PR to allow using either cmdbuf/fence or CRS for synchonization, so it's easier to debug and compare this going forward once there's more time to look into the vsync issue.
Thanks for the quick updates!
I'll give it a test when I get a chance, but I think this looks good.
@aet I tested it on Android and Linux. It looks good to me. Thanks you for contribution.
With the use of IUnityGraphicsVulkan::CommandRecordingState out of the way, here's another part I've been using in my branch for few years now. The plugin architecture has changed few times over the years, so I never had time to try pitching it in. 😄
VulkanTexture2D::GetNativeTexturePtrV returns the real native texture handle like other targets
DoImageLayoutTransition tweaks to avoid touching layout and to remove most if not all Vulkan API validation issues
Query if VK_MEMORY_PROPERTY_HOST_CACHED_BIT available, improves CPU encoder performance. It was tested against 1080p60 at the time, but most of my testing goes against nvenc
Prep work for supporting staging textures (cpu writes instead of cpu reads preference), though it's missing VkBuffer related additions. This will be used for improving video rendering performance
Add GetTexturePtr helper. I need a way to figure out from a native texture handle if it's maintained by Unity or the plugin. For now it's Vulkan only, but I'm using it with other targets when adding native texture uploads/staging texture support.