How were the ground truth normal maps computed for training the pix2pix network?

I understand the normal maps are computed in camera space, but could you please elaborate on what the exact transformation is from world space normals to camera space? E.g. in camera space, how are the x,y,z axes defined?

I've been looking at this code, but when I pass a predicted mesh given by the demo code to _render_normal I get a blank image (which suggests these are not the right transformations): https://github.com/facebookresearch/pifuhd/blob/e47c4d918aaedd5f5608192b130bda150b1fb0ab/lib/evaluator.py#L83

In the demo code, it looks like the normals are directly predicted by the network, so I'm having trouble deciphering what the coordinate system is: https://github.com/facebookresearch/pifuhd/blob/e47c4d918aaedd5f5608192b130bda150b1fb0ab/apps/recon.py#L118

In summary, I'm trying to use the normal maps predicted by your pretrained pix2pix network, but to do so I need to know how the ground truth normal maps used to train this network were computed.

Thank you!

facebookresearch / pifuhd

How were the ground truth normal maps computed for training the pix2pix network? #197