cwchenwang / NeRF-SR

NeRF-SR: High-Quality Neural Radiance Fields using Supersampling
https://cwchenwang.github.io/NeRF-SR
136 stars 13 forks source link

Question about Supersampling #8

Open XLR-man opened 1 year ago

XLR-man commented 1 year ago

The supersampling method described in the paper is to divide a pixel into sub-pixels for training. I'm a little confused about this method and how it's written in the code.

First, the downscale variable in the code should downsample the image at the specified resolution. For example, if the input image resolution is 504×378 and downscale=2, the resolution size of the downsampled image is 252×189.My question is why down-sample it, is it to train with 252×189 resolution images as low-resolution images and 504×378 resolution images as GT?

At the same time, I do not know whether my understanding of the following code is correct

self.all_rays = torch.cat(self.all_rays, 0) #(61*h/X*w/X,X*X,8)
self.all_rgbs = torch.cat(self.all_rgbs, 0) #(61*h/X*w/X,3)
self.all_rgbs_ori = torch.cat(self.all_rgbs_ori, 0)#(61*h/X*w/X,X*X,3)

It appears to be 252×189 resolution as input for low resolution, and then 504×378 resolution as GT. At the same time, 504 x 378 is divided into multiple s x s patches, which should correspond to supersampling method.

In general, I do not understand how to divide each pixel in the input low-resolution image into s×s sub-pixels, whether to divide multiple s×s patches on the input low-resolution image or the GT image. How is the information of the divided subpixel rays and the corresponding colors obtained? Can you point out specific code actions? Thank you very much for your contribution. But my understanding may be wrong, I hope you can answer my doubts.

Best regards!

cwchenwang commented 1 year ago

The GT image is 252x189 in this case and subpixel rays are divided at the resolution of 504x378. 504x378 images are not used for supervising the training, i.e. self.all_rgbs_ori are not used.

XLR-man commented 1 year ago

You said GT images are 252×189 resolution images, so C(r) in the loss function is self.all_rgbs? Does the color of the second dimension by averaging self.all_rays and subtracting self.all_rgbs constitute a loss function?

So how do you render a low resolution image to a high resolution image? In the example above, are the rays of a 252×189 resolution image that are self.all_rays unrolled by the second dimension into a 504×378 image when rendered?

Does downscale refer to the down-sampling of the 504×378 resolution image and the up-scaling of the 252×189 resolution image?

cwchenwang commented 1 year ago

C(r) in the loss function is self.all_rgbs. Remember that self.all_rays and self.all_rgbs are different in dimension, the former is (252x189, 2x2, 3) and the latter is (252x189, 3). You can render a 504x378 image by rendering self.all_rays and unroll it.

XLR-man commented 1 year ago

Ok, I get the idea. So does this mean that if we only have low resolution images, like 252×189 but not 504×378, then we can't train?

cwchenwang commented 1 year ago

No. We only need low resolution images to train. I start with 504x378 images because I can directly sample 504x378 ray directions in this way, but we can also sample ray directions without it.

XLR-man commented 1 year ago

So why would you downsample a 504×378 resolution image to 252×189 resolution?

You mean that only 252×189 resolution images can be trained, but self.all_rays of 252×189 resolution images should not be able to become (252×189, 2×2, 3) dimensional, right?

In the code, you need the 504×378 resolution image to generate the self.all_rays of the 252×189 low-resolution image. If the 504×378 resolution image is not available, how can you obtain the self.all_rays?

cwchenwang commented 1 year ago

You can directly use an empty image to get 504x378 rays. I recommend you to run the code and see how it works :)