Open Bachstelze opened 1 year ago
Wouldn't you have to retrain text to image models because the representations are different?
Yes, could expertiment with finetuning once you swap it out. I'm 1000% sure FLAN-T5 would result in higher fidelity output, better composition, way better spatial awareness. I think "tango model" kind of validates this.
Can we use FLAN-T5 as a language model? Those FLAN models can represent English and other languages significantly better in our tests. "If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages."