TheoCoombes / ClipCap

Using pretrained encoder and language models to generate captions from multimedia inputs.
95 stars 13 forks source link

Experimental: Optionally use all ViT features of CLIP #2

Closed andreaskoepf closed 2 years ago

andreaskoepf commented 2 years ago

train.py got new paramaters: use_all_vit_features (default True) and pos_embeddings (default False) create_dataset.py got new parameter: use_all_vit_features (default True)

TheoCoombes commented 2 years ago

Thanks! :)