EnVision-Research / Lotus

Official Implementation of LOTUS: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
https://lotus3d.github.io
Apache License 2.0
493 stars 24 forks source link

What is the difference between the 'generation' and the 'regression' models? #18

Closed rafaelspring closed 1 month ago

rafaelspring commented 1 month ago

Which one would I use for regular inference (image in, depth or normals out)?

haodong2000 commented 1 month ago

Hi @rafaelspring , thanks for your interest!

For regular inference you said, please use regression mode.

The difference is simple: in generation mode, the RGB image serves as a condition, and the dense annotations are generated from Gaussian noise in one step (which enables distribution modelling). In regression mode, we remove the noise input, and directly predicts dense labels from the RGB image.

Shortly, in generation mode, the inputs are RGB image and Gaussian noise. In regression, the input is only the RGB image.

Best,