facebookresearch / pifuhd

High-Resolution 3D Human Digitization from A Single Image.
Other
9.5k stars 1.44k forks source link

Issues about the inputs of HGPIFuMRNet #51

Closed lu-jincheng closed 4 years ago

lu-jincheng commented 4 years ago

When I read the code of HGPIFuMRNet.py, I'm not clear about the meaning of _imageslocal([B1, B2, C, H, W]) and _imagesglobal([B1, C, H, W]). What is the difference between _imageslocal and _imagesglobal and what does B1, B2 mean? And for _pointsnml and _labelsnml, are they from .obj file or calculated? @shunsukesaito Thank you!

shunsukesaito commented 4 years ago

What is the difference between images_local and images_global and what does B1, B2 mean?

images_global corresponds to the input for the coarse PIFu module and images_local to the fine PIFu module. Regarding B1/B2, during evaluation, B2 is always 1. During training, images_local are not the whole image (1024 x 1024), but local cropping (512 x 512). B2 allows us to take multiple cropping from the same image.

And for points_nml and labels_nml, are they from .obj file or calculated?

For the ground truth, yes we calculated from the original input file. But these labels are computed from pix2pixhd inference of surface normal in the image space. Please refer to the paper for more details.

lu-jincheng commented 4 years ago

What is the difference between images_local and images_global and what does B1, B2 mean?

images_global corresponds to the input for the coarse PIFu module and images_local to the fine PIFu module. Regarding B1/B2, during evaluation, B2 is always 1. During training, images_local are not the whole image (1024 x 1024), but local cropping (512 x 512). B2 allows us to take multiple cropping from the same image.

And for points_nml and labels_nml, are they from .obj file or calculated?

For the ground truth, yes we calculated from the original input file. But these labels are computed from pix2pixhd inference of surface normal in the image space. Please refer to the paper for more details.

Thank you for your kind reply! I have another question. During training, is the local cropping finished randomly with overlapping _imageslocal or just cropping whole image from 10241024 to 4 512512 images? And if it is finished randomly, what is the proper B2 number?

shunsukesaito commented 4 years ago

The selection of cropping window is completely random (note that this cropping is applied only during training, and the whole image (1024x1024) is fed into the network during inference.) I tried various numbers but reconstruction accuracy is insensitive to the choice of B2. I ended up using B2 = 2, but can be any numbers as long as it can fit into your GPU memory.

lu-jincheng commented 4 years ago

What is the difference between images_local and images_global and what does B1, B2 mean?

images_global corresponds to the input for the coarse PIFu module and images_local to the fine PIFu module. Regarding B1/B2, during evaluation, B2 is always 1. During training, images_local are not the whole image (1024 x 1024), but local cropping (512 x 512). B2 allows us to take multiple cropping from the same image.

And for points_nml and labels_nml, are they from .obj file or calculated?

For the ground truth, yes we calculated from the original input file. But these labels are computed from pix2pixhd inference of surface normal in the image space. Please refer to the paper for more details.

When I try to train pifuhd, in the code HGPIFuMRNet.py line 102, it upsample nmls to loadSizeBig (I think is 1024 1024), and cat nmls with _imageslocal (I think is 512 512), where the size does not match. Is there something I'm wrong?

And another question is, since the pix2pixhd is contained in the coarse PIFu module, when I train a fine PIFu module based on the pretrained coarse module, do I still need to input points_nml and labels_nml or just set them None?

shunsukesaito commented 4 years ago

The input normal and images_local should be using the same cropping to achieve pixel-aligned reconstruction. So make sure your input normal is cropped in the same way as images_local.