RenderKit / oidn

Intel® Open Image Denoise library
https://www.openimagedenoise.org/
Apache License 2.0
1.78k stars 164 forks source link

Improve performance on Apple silicon for larger images #110

Closed Developer-Ecosystem-Engineering closed 2 years ago

Developer-Ecosystem-Engineering commented 3 years ago

When using large images (4k+), performance is increased by ~50% when re using memory on M1.

atafra commented 3 years ago

For what image sizes did you measure a ~50% speedup? For 5-10K images I measured speedups up to ~35% on an M1, which is also great but lower than expected. Thanks!

Developer-Ecosystem-Engineering commented 3 years ago

The built in oidnBenchmark reports a change from 7500ms to 5100ms (47% improvement)

atafra commented 3 years ago

For which test/resolution did you get these numbers (e.g. RTLightmap.hdr.4096x4096)?

Developer-Ecosystem-Engineering commented 3 years ago

Hi Attila,

We've been performing more tests, there is a bit of variability and we see a larger difference on higher vs lower memory configs (8GB vs 16GB). Latest brings us closer to what you see (25%-35%)

./oidnBenchmark --maxmem 6000 -n 3 RT.hdr_alb_nrm.3840x2160 ... 6723.03 msec/image RT.ldr_alb_nrm.3840x2160 ... 5509.69 msec/image

vs ./oidnBenchmark --maxmem 2000 -n 3 RT.hdr_alb_nrm.3840x2160 ... 5354.92 msec/image RT.ldr_alb_nrm.3840x2160 ... 5417.03 msec/image

Would you prefer a change anchored to available memory size?

atafra commented 3 years ago

Thanks for the details! These numbers are much closer to my measurements but it seems that the variation between machines is quite high:

./oidnBenchmark --maxmem 6000 -n 3 -r RT.ldr_alb_nrm.3840x2160 RT.ldr_alb_nrm.3840x2160 ... 5048.53 msec/image

./oidnBenchmark --maxmem 2000 -n 3 -r RT.ldr_alb_nrm.3840x2160 RT.ldr_alb_nrm.3840x2160 ... 5169.98 msec/image

So on my machine limiting the memory usage to 2000 MB is actually a bit worse.

I recently discovered some inefficiency in the HDR code path which introduces some bias into these performance results, so I recommend testing in LDR mode for now. This means that the actual performance gap is even smaller.

I think a global memory limit for M1 is sufficient but 2000 MB seems to hurt performance a bit in some cases. ~4000 MB may be a safer choice and is reasonable on 8 GB systems as well. I will run some more benchmarks for various resolutions with the performance fix to make sure that the new memory limit works well. In any case, this change will be included in the upcoming release.

atafra commented 2 years ago

I'm closing this because since v1.4.0 the default memory usage is only 3 GB regardless of the architecture (due to some optimizations), and thus performance has increased on M1.