TheoCoombes / ClipCap

Using pretrained encoder and language models to generate captions from multimedia inputs.
94 stars 15 forks source link

train and release models #7

Open rom1504 opened 2 years ago

rom1504 commented 2 years ago

to begin with, train on coco, and have clipcap.load_pretrained("coco_global_vit_b_32")