LOTUS prediction type - Githubissues

Thanks for your attention!

Using a different objective function during fine-tuning is permissible, much like fine-tuning other foundational vision models such as Transformer and ResNet. The pre-trained SD model provides powerful visual priors that enhance zero-shot generalization in downstream tasks. Given the differences between pre-training and fine-tuning tasks, it is often necessary to adopt a more appropriate objective function. As stated in our paper, "the original settings for image generation are no longer the optimal solution for downstream dense prediction tasks." Investigating which objective function is best suited for dense prediction is one of our key contributions. We analyze this in Section 4, where x0 demonstrates superior performance compared to epsilon and v, as shown in Figures 6 and 11.

Best,

EnVision-Research / Lotus

LOTUS prediction type #15