Closed moonryul closed 1 month ago
You actually can find some interesting behaviors when using texts. Sometimes you can get your text correlated in the non-seen regions in your input. But since we do not train on any text, this is an artifact in prior preserving, and may or may not work in practice.
In section 2.4 "Global Condition: FlexDiffuse" of the paper
it is said:
In the released Zero123++ models, we do not impose any text conditions, so T is obtained by encoding an empty prompt.
Does it imply that I cannot use non-empty text prompt with the current release of the zero123++ ??