NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.29k stars 139 forks source link

rasterize resolution #128

Closed chky1997 closed 10 months ago

chky1997 commented 11 months ago

Hi, thank you for your work! I noticed that the CUDA rasterizer does not support output resolutions greater than 2048×2048. Do you have further plans to raise the resolution limit, maybe to 4096×4096? Or is that any method for me to use the rasterizer to process images with greater resolutions? Thank you!

s-laine commented 11 months ago

There is no simple way to increase the resolution, unfortunately. There is a tradeoff between subpixel accuracy and viewport size, and the internal calculations would either overflow or lose too much precision after 2048×2048.

If you can afford some performance degradation, you can rasterize the image in tiles. For example, you can rasterize a 4096×4096 image in four 2048×2048 tiles by scaling the vertex buffer x and y by 2.0 and then offsetting x and y by ±w depending on which quadrant you want to render.

After you combine the rasterizer outputs into a single large image, you should be able to use the rest of the ops (interpolation, antialias, etc.) as-is with the original vertex buffer.

s-laine commented 10 months ago

Here's an example implementation of using 2×2 tiles that can be used in place of dr.rasterize(). Thus, this function can be used to rasterize up to 4096×4096 resolution outputs, with the constraint that the dimensions must be divisible by 16. If higher resolutions are needed, the idea can be easily extended to use even more tiles.

This is not super efficient, but should be fine for rendering pipelines where the rasterization op is a small part of the overall cost.

def rasterize_tiled_2x2(glctx, pos, tri, resolution, ranges=None):
    tile_resolution = [x // 2 for x in resolution]
    pscl = pos * torch.tensor((2, 2, 1, 1), dtype=pos.dtype, device=pos.device)
    px = torch.nn.functional.pad(pos[..., 3:], (0, 3))
    py = torch.nn.functional.pad(pos[..., 3:], (1, 2))
    out_0, db_0 = dr.rasterize(glctx, pscl + px + py, tri, tile_resolution, ranges)
    out_1, db_1 = dr.rasterize(glctx, pscl - px + py, tri, tile_resolution, ranges)
    out_2, db_2 = dr.rasterize(glctx, pscl + px - py, tri, tile_resolution, ranges)
    out_3, db_3 = dr.rasterize(glctx, pscl - px - py, tri, tile_resolution, ranges)
    out = torch.cat([torch.cat([out_0, out_1], dim=2), torch.cat([out_2, out_3], dim=2)], dim=1)
    db = torch.cat([torch.cat([db_0, db_1], dim=2), torch.cat([db_2, db_3], dim=2)], dim=1)
    return out, db

Closing the issue now. If there is a need for a more efficient implementation, or the tiling implementation is inapplicable for some other reason, please open a new issue.