TheoCoombes / ClipCap

Using pretrained encoder and language models to generate captions from multimedia inputs.
95 stars 13 forks source link

Release a pretrained model and add inference example #4

Open rom1504 opened 2 years ago