NVIDIAGameWorks / RTXGI-DDGI

RTX Global Illumination (RTXGI)
https://developer.nvidia.com/rtxgi
Other
696 stars 55 forks source link

Some bugs #55

Closed ivalylo closed 2 years ago

ivalylo commented 3 years ago

The plugin currently crashes if you try to increase the resolution above certain limit (like 32x32x32).

DDGIVolumeComponent.cpp:843

proxy->ProbesIrradiance in this case is NULL (since one can't create texture with resolution > 16K), so you must make a pointer check... I would also like to get some sort of message (at least in the log) when the volume fails to work for some reason.

Next, that related big texture (called strangely DDGIDebugOutputDesc, but it's actually the radiance buffer) is not accounted with "r.RTXGI.MemoryUsed" (but it's the biggest one).

Another bug I've noticed is when using 10 bit irradiance mode. It doesn't really work... If you turn on light and then turn it off - you don't get the initial result. It's brighter.

Here is the starting position

correct

This is after turning point light on, and then off (and giving it time to update)

wrong

Changing the Probe History Weight also causes the brightness to increase indefinitely. Using 10 bit is tempting, but currently, it looks unusable for dynamic scenes.

ivalylo commented 3 years ago

The bug with the 10bit colors, comes from the use of PF_A2B10G10R10 for the irradiance texture. This texture needs alpha channel to store a weight, and 2 bits will not do it. I fixed this by using PF_A16B16G16R16 which is still half the size of the 32-bit mode

ivalylo commented 3 years ago

Hm, the alpha channel is not really used in the radiance textures. I guess PF_A16B16G16R16 worked out because of the higher precision.

ivalylo commented 3 years ago

Ok, after few days, I rewrote half of the plugin. It was too slow for me. And with these GPU prices (if you can find GPU at all)... There is more clever way to accumulate and update samples then the hysteresis. Here is a video

https://www.linkedin.com/posts/ivaylo-ivanov-44458648_playing-around-with-rtxgi-in-unreal-engine-activity-6857019051068137472-rRww

About the original issue with the light on/off, maybe it resolved when I removed one multiplier for the 10-bit mode:

#if !RTXGI_DDGI_DEBUG_FORMAT_IRRADIANCE
    irradiance *= 1.0989f;                      // Adjust for energy loss due to reduced precision in the R10G10B10A2 irradiance texture format
#endif

PF_A16B16G16R16 is actually UNORM in Unreal. I ended up using PF_FloatRGBA

Atrix256 commented 3 years ago

What sort of changes made the biggest impact to performance for you?

ivalylo commented 3 years ago

Without beefy RTX card, the only way is to reduce the number of rays per frame... The original algorithm however expects you to make at least a thousand rays per probe per frame to reach good quality. This is of course impossible for older cards. You must lower the samples count, which forces you to increase the hysteresis in order to hide the flickering and splotches (since each frame, the samples are rotated). For me at least, this is a red flag that the algorithm is not working optimally. One must take the full, final number of samples, since this will improve the quality and remove any flickering. The rotation and hysteresis thing is quite gimmicky IMO. I just distribute these samples during N frames. My GPU can make maximum 288 samples, which multiplied by 4 frames, can get me above 1K. Then this process is repeated using double buffering of the probes. Since one sample cycle takes just 4 frames, it's quite fast, actually resolves faster then then hysteresis, and you don't need to guess good value. And since I have double buffering, I can detect when the irradiance converges and can stop updating given probes.

Detecting lighting changes and reenabling the probes is a bit more involved. I use the rays for probe classification to compute another "average" irradiance, which is always computed at the start of a pass. It's based on only 32 rays, so it's very fast, although not that precise, and can miss something sometimes. The whole system works based on the way the secondary bounces work - by sampling the other probes, so if something changes, you will detect it in the other probes recursively, until the solution is progressively resolved. Since its progressive, it tends to not awake too many or unrelated probes at the same time. Of course, these types of "clever" algorithms suffer from too many random factors, therefore the frame rate can vary a lot (can't get worse then sampling all probes though). But my idea is to use dynamic changes sparingly and if a given effect causes too big FPS drop, it can be reworked.

I also needed detection for convergence in order to know when to save the irradiance cache. My idea is to precompute it, then load at runtime, and update only when something changes. I used zstd for compression, and 32x32x16 grid will compress to slightly more then 1MB.

pmorley-nv commented 3 years ago

proxy->ProbesIrradiance in this case is NULL (since one can't create texture with resolution > 16K) The API limits each dimension to 16K and not the product of all 3 dimensions. Changing the code to check each dimension of the probe count does not exceed 16K seems to work and not crash. DDGIVolumeComponent.cpp:1051

tiont commented 2 years ago

Hi @ivalylo. The latest version of the binary RTXGI UE4 plugin should fix this issue. We recommend you try that version and report back if you are still seeing the issue. Thanks.