Open vinevix opened 1 year ago
You could train the model on any dataset as long as it contains the image-text pair. Maybe you need to find the average length of all captions on Flickr30k. And then you need to adjust the steps of diffusion according to the average length.
You could train the model on any dataset as long as it contains the image-text pair. Maybe you need to find the average length of all captions on Flickr30k. And then you need to adjust the steps of diffusion according to the average length.