-
Is there any possible to use it in video retrieval?
-
I am a student in Toronto learning about multimodal models and multimodal retrieval.
Can embeddings be extracted from your models ?
I would like to compare retrieval results from your model to CLIP.…
-
When I write the text video retrieval function as a Python script, when using the function, GPU video memory increases with the increase of the number of uses (search), and the kill script is released…
-
We are unable to reproduce your experimental results on the MSRVTT QA dataset, with an accuracy rate of around 31. May I ask for the reason, or can you provide some checkpoints
-
### Problem
Currently to customise context retrieval settings in each chat, the user has to go into a messy modal to customise each chat which does not make it easy to chat.
Video shows current …
-
Hi,
We found that video text joint loss in pretraining is calculated from masked video and text. Why not use the origin video and text like retrieval finetune?
https://github.com/microsoft/UniVL/blo…
-
||link|
|----|---|
|paper| [CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval](https://arxiv.org/pdf/2104.08860v2.pdf) |
|code| [papers with code](https://paperswithcode.com…
-
Azure Open AI is services from Auzure platform for Generative AI
Here we can perform search
I has APIs using REST, we can
Dense Captions. : For every Item detected in the image, it can genera…
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar feature requests.
### Description
Dear @glenn-jocher and …
-
Thanks for your extraordinary work of video-text retrieval with T2VLAD.
Here, I have a little request about this work: could you share the other dataloaders, configs of MSR-VTT at 1k-A split, MSVD an…