RenderKit / oidn

Intel® Open Image Denoise library
https://www.openimagedenoise.org/
Apache License 2.0
1.74k stars 160 forks source link

OIDN_DEVICE_TYPE_CPU unsupported #164

Closed Nielsbishere closed 1 year ago

Nielsbishere commented 1 year ago

It seems like the device type is not supported even though it seems very useful. Was this support removed in OIDN2 or is it only available when there are no GPUs available? On an AMD device, I want to add CPU fallback because of the current limitations with HIP. As another use case; when the device is almost out of VRAM, I found that OIDN2 reduces performance by a factor of 5. This is because the gpu is actively swapping gpu memory between cpu. It has been checked that with OIDN1 the time taken is a lot better because it doesn't have to allocate more on the gpu. I'd also like to request a constant or a device property of how much vram would be needed from oidn's side unless this is a negligible amount. Because then we can check if we're likely to hit the vram limit and force cpu fallback.

atafra commented 1 year ago

What makes you think that OIDN_DEVICE_TYPE_CPU support has been removed? It works in OIDN2 just like in previous versions.

Regarding the memory usage: it's the other way around, the application should specify how much memory it wants OIDN to use. It's the maxMemoryMB filter parameter.

Nielsbishere commented 1 year ago

Because I queried an nvidia and amd for device types and there's no cpu device available being reported. And oidnNewDevice with cpu fails. Amd reported two hip devices and NV only one cuda device. I guess if maxMemoryMB is reached then filter creation will fail and a cpu device will have to be created and the entire routine would have to be redone but with a cpu device instead.

atafra commented 1 year ago

This is very strange, something is likely wrong with your setup. Are you using the official binaries or did you built it yourself? Are you sure that all OIDN DLLs are available in the same directory?

No, maxMemoryMB works differently, please look up the docs. Basically it tries to limit the memory usage to the specified amount, sacrificing performance, but it's just a hint. Filter creation won't fail if actual usage is higher but you need to request a very low amount for that.

Nielsbishere commented 1 year ago

I got the includes, libs and dlls from the release, didn't build myself (used to, but not after the 2.0 update due to added complexity to build process). Tested on a 3080 TI and 6900 XT by querying device type and device count using the helper functions.

I don't know if that's sufficient to ensure a good user experience. Is there no way to query how much memory a filter will use? I'd rather completely destroy the filters and go back to cpu mode if it's even close to max resources, as doing otherwise will create huge overhead.

atafra commented 1 year ago

What CPU are you using? Is is possible that you disabled CPU support by accident, by setting the OIDN_DEVICE_CPU environment variable to zero? To make debugging easier, could you please extract the official binaries package to a clean location and run the following from the bin directory: oidnBenchmark -ld? This will print the detected devices.

Regarding the memory usage query: currently there is no way to query how much memory a filter will use, but as I said before, you can request how much memory OIDN should use. Why isn't limiting the memory usage an option in your case? You can pick whatever memory usage you are comfortable with depending on the free memory. If the amount of free memory is very low, say less than 512 MB, it's not even worth trying to use OIDN on the GPU. You wouldn't need to destroy the filter because you wouldn't even create it if you don't have enough memory. I think the user experience would be quite good this way. OIDN is limiting the memory usage to a somewhat arbitrary default value anyway. Both OIDN and the application could perform better if this would be manually set instead.

Querying the exact memory usage before initializing and allocating anything would be quite tricky. The problem is that OIDN may (and will) use backends which manage their own memory too and OIDN has no ability to query how much memory will that backend use upfront. Because of this, it's not really possible to provide guarantees for memory usage in general for all backends, especially before doing any allocations.

Nielsbishere commented 1 year ago

I found some time to try it:

Device 0
  Name: AMD Radeon RX 6900 XT
  Type: HIP
  UUID: 58580000000000000000000000000000
  PCI : 0000:03:00.0

Device 1
  Name: AMD Radeon RX 6900 XT
  Type: HIP
  UUID: 58580000000000000000000000000000
  PCI : 0000:03:00.0

Device 2
  Name: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
  Type: CPU

Weirdly enough, this one does show it. Is it possible that the pre-built lib/dll has that define enabled? Or would this be affected then too? I tried explicitly creating a device with id 2 and that failed and as I mentioned before, it didn't even show up for me :/

I think the major problem with setting that setting in oidn is that I have no idea of the general ballpark of how much it needs per pixel (I don't need exact numbers, just a generous estimate). If I have an estimation then we can keep track of it. If that 500MB is an estimation for 1080p then I'd have to check for 4k, because my guess is that that'd not be sufficient if I set that limit and then the limit might still be hit. We have 3 filters and can render 4k btw.

atafra commented 1 year ago

If the CPU does show up in this list, it must work. Try running oidnBenchmark -d cpu -v 1 to confirm.

Are you sure that you have all OIDN DLLs and all their dependencies (e.g. TBB) from the bin directory where they should be? I really can't think of anything else. We're doing extensive testing for OIDN and this is the first time we have ever got such an issue reported.

I don't think that setting the maximum memory usage is a major issue. OIDN does not have linearly increasing memory requirements per pixel because the memory usage is limited by default. It would use far too much memory if we would let it use as much as it "needs". The memory usage does not increase linearly with the number of filters either. Almost all required memory is shared between filters, and each filter requires only a couple of additional megabytes.

If you set maxMemoryMB to 1024 then it will use up to 1GB for 1080p, 4k or any resolution. You don't need to recheck the memory usage for different resolutions. You could simply pick an interval that you're willing to dedicate to denoising, e.g. 128-1536 MB, and then set the actual limit at runtime depending on the amount of free memory. Even limiting memory usage to as low as 128 MB would be most likely much faster than running on the CPU. This would be a much more performant approach than just making a binary decision whether to use OIDN at some arbitrary memory usage or not at all.

Nielsbishere commented 1 year ago

Thank you for your time, all makes sense. I should have the required dlls but it is possible I missed something, so I'll double check ASAP to be sure. I think what you're saying about the 128MB is true except when the gpu is already swapping a lot (so the 128MB isn't even available), then cpu should of course be used.

atafra commented 1 year ago

There should be only four reasons why the CPU device is not detected:

Yes, if there is less than e.g. 128MB free memory (+ some extra amount for safety), the CPU should be used instead. If you have more memory available, you should request up to about 1.5-2GB of memory for optimal performance. Performance with 128MB is not great but still pretty good compared to most CPUs. You can easily experiment with the performance impact of different memory limits with: oidnBenchmark -maxMemoryMB somevalue.

Nielsbishere commented 1 year ago

Thank you for your help & time; seems like a dll got incorrectly copied. After fixing this it now finds the cpu device fine. The memory thing should be sufficient as well

atafra commented 1 year ago

You're welcome, I'm happy to hear!