UARK-AICV / VLCAP

[ICIP 2022] VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
https://ieeexplore.ieee.org/document/9897766
28 stars 5 forks source link

How to pre-train matrix Wi by CLIP to obtain image embedding? #11

Closed gdg452 closed 1 year ago

gdg452 commented 1 year ago

24/5000 I'm sorry, I don't quite understand this part, can you elaborate? Thank you!

gdg452 commented 1 year ago

sorry,I misunderstood