-
### Project Name
VidSage
### Description
# VidSage: Video Insights using Graph RAG
https://www.youtube.com/watch?v=IUSCWtB9jWk
VidSage focuses on processing video data, storing it in Azur…
-
We test the performance of VideoClip through the video-text retrieval task on the COIN dataset, but the performance is much lower than the reported performance of VideoQA (26%
-
||link|
|----|---|
|paper| [CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval](https://arxiv.org/pdf/2104.08860v2.pdf) |
|code| [papers with code](https://paperswithcode.com…
-
In[ CMIN_moment_retrieval/dataloaders/clip_loader.py line 66 ](https://github.com/ChenyunWu/CMIN_moment_retrieval/blob/df44a230a0cd83d9ab3e282601da60cbca56a102/dataloaders/clip_loader.py#L66)
`if lab…
-
Could be fun to have a tokenizer like "take all video frames, apply clip, transform into cluster N of 2^17 (what I have in clip retrieval index), apply BPE, return sequence"
Inspired by https://arx…
-
I am a student in Toronto learning about multimodal models and multimodal retrieval.
Can embeddings be extracted from your models ?
I would like to compare retrieval results from your model to CLIP.…
-
Hey - I am unable to reproduce the reported zero-shot results. So far I tried it on MSRVTT and MSVD, I would appreciate it if you kindly have a look.
Here is what I got after running these 2 script…
-
Are the weights of original CLIP layer always frozen during the whole training process?
-
Hello, I have run the training and embedding extraction and I'm wondering how I can see any examples of text that the model retrieved.
The embeddings and h5 files seem to be mostly numeric--How do …
-
First of all congrats on the paper and thanks for providing the code!
In the paper at 'Zero-shot language-based multi-modal joint retrieval' you mention that integrating/combining multiple embeddin…