EPFL-VILAB / omnidata

A Scalable Pipeline for Making Steerable Multi-Task Mid-Level Vision Datasets from 3D Scans [ICCV 2021]
Other
395 stars 49 forks source link

Different transformation of input for Normal and Depth prediction #49

Closed YuxuanSnow closed 1 year ago

YuxuanSnow commented 1 year ago

Hi!

thank you for the amazing work. I have a tiny question regarding the different transformation operation on the image for two modality prediction.

For depth prediction, you normalize the image, see https://github.com/EPFL-VILAB/omnidata/blob/main/omnidata_tools/torch/demo.py#L92; However, for normal prediction, you didn't normalize the image, see https://github.com/EPFL-VILAB/omnidata/blob/main/omnidata_tools/torch/demo.py#L74.

Can you briefly explain this choice?

Thanks!

alexsax commented 1 year ago

Hi! Good question. The normalization is just linear rescaling to [-1,1] vs [0,1]. IIRC the choice doesn’t matter too much as long as you pick a convention for the network and stick to it.

Probably we chose the depth scaling because we were comparing to MiDaS and it made the code a bit simpler. And we chose the normal scaling because that simplified the comparisons to cross-task consistency.

YuxuanSnow commented 1 year ago

Thanks for reply! Anyway, the normalisation choice is not affected because of property difference between surface and normals, if I understood correctly :)