NVIDIA / Q2RTX

NVIDIA’s implementation of RTX ray-tracing in Quake II
Other
1.22k stars 181 forks source link

Support for multiple samples per frame #239

Open crozone opened 2 years ago

crozone commented 2 years ago

Feature request

Support rendering multiple samples per pixel, per frame. Have the sample count configurable in the settings menu.

Also, apologies if this feature already exists as a convar or as a constant in the codebase, but a scan through the client manual and codebase suggests that at most only one primary ray is calculated per pixel per frame at the current time. pt_accumulation_rendering is close, but this only applies to photo mode, not live rendering.

Why

Running with the denoiser disabled at 1080p looks pretty rough, but the framerate will easily hit > 200fps on an RTX 3080, even with refraction depth set to 8. This seems to indicate that there is visual quality left "on the table" and that the number of samples taken per pixel (1?) is conservative for the latest RTX GPUs.

Rendering the game with no denoiser at 3840x2160 using NVIDIA DSR factor x4 and scaling that back down to 1080p (with 33% smoothing) yields a very nice looking ~60fps image. The noise is reduced to film-grain levels, rendering the game aesthetically pleasing even with the denoiser completely disabled.

Therefore, it might be useful to be able to take more than 1 sample (maybe 2 or 4) per pixel, per frame, to increase visual quality. This would effectively be like the denoiser's accumulator but performed at a single moment in time.

This would also assist the denoiser at lower resolutions where it appears to exhibit more temporal artifacts due to excessively noisy input (https://github.com/NVIDIA/Q2RTX/issues/233).

How

A few thoughts on possible implementations:

  1. Internally render the resolution at a factor of the screen resolution and then downsample to the screen resolution. Similar to DSR but performed internally before the denoiser gets to the buffers. Has the advantage that it also provides some bruteforce MSAA for edges. Disadvantage is that factors have to be squares, and it uses much more VRAM.

  2. Render the frame n times and use the average of the result. Has the advantage that any arbitrary number of samples can be taken, and a rolling average uses a fixed amount of VRAM. Could even be adjusted dynamically on-the-fly in order to hit a target framerate. I'm pretty sure the accumulation system from the photo mode could be used directly.

  3. The same as 2, but only render stage 1 of the pathtraceronce. Then, copy/alias the output of stage 1, and render stages 2->4 multiple times and average that. No idea if this is actually possible with how the pathtracer is implemented, but it saves re-running primary rays more than once, since they're always the same and don't derive anything from blue noise unlike stages 2->4. Primary rays only seem to be about 10% of the render time, so it might not be worth the complexity anyway.

Is any of this a good idea?

Calinou commented 2 years ago

I think solution 1 (spatial supersampling) makes the most sense, and is easier to implement. This could also be plugged in with dynamic resolution to use a resolution scale above 100% only when the frametime budget allows. Regarding VRAM usage, I'd argue that on GPUs where supersampling is viable, you'll always have 8 GB of VRAM or more.

Edit: This is already supported in 1.5.0. You can set the static resolution scale up to 200% (4× SSAA) in the dynamic resolution submenu of the video settings, or increase the maximum dynamic resolution scale up to 100% (2.25× SSAA) in the same menu.

Solution 2 (temporal supersampling) is likely harder to adapt for dynamic resolution/consistent FPS lock use cases – you can't choose to accumulate a fractional amount of frames at a given time :slightly_smiling_face: On the other hand, spatial supersampling makes any scale factor valid. However, a scale of 200% per-axis looks the sharpest as it's downscaled by an integer factor (followed by 150%, which isn't integer but still looks fairly sharp).

Disadvantage is that factors have to be squares, and it uses much more VRAM.

Supporting anamorphic resolution scale (different X/Y resolution scales) is likely possible, but not trivial. When paired with dynamic resolution, this allows for less noticeable resolution changes by decreasing the X axis first, then the Y axis.

For low-end setups, you may also find that decreasing the X resolution scale to 50% and keeping the Y resolution scale at 100% may look better compared to decreasing the resolution scale to 71% (roughly 1/sqrt(2)).

teddybee commented 2 years ago

I think the ask is about SPP, that I missing also in a path tracing techdemo. It could be set in some version ago, but I don`t find it now in the console commands. The game is very noisy without denoiser, with spp=4 or 8 it could be very good.