I have found that the hyper-parameter cond_stage_trainable is set true in the config file.
The cond_stage_model should be the CLIP text encoder, why fintuning the parameters of this module is necessary? In the original paper, have they finetuned the T5 encoder? Thanks!
I have found that the hyper-parameter cond_stage_trainable is set true in the config file. The cond_stage_model should be the CLIP text encoder, why fintuning the parameters of this module is necessary? In the original paper, have they finetuned the T5 encoder? Thanks!