TheoCoombes / ClipCap

Using pretrained encoder and language models to generate captions from multimedia inputs.
94 stars 15 forks source link

minimal usage instruction #6

Open rom1504 opened 2 years ago

rom1504 commented 2 years ago

Install with pip install ClipClap

use with

import clipclap

model = clipclap.load_pretrained()
text = clipclap.generate(PIL.open("https://some/img"))

text = clipclap.generate(my_clip_embedding)

(this is an example, please tune the API as you like)

rom1504 commented 2 years ago

https://github.com/UKPLab/EasyNMT#usage good example

uu95 commented 1 year ago

Hello, @rom1504 do you have a pretrained model which I can use to perform audio captioning using this example?