-
Is there a plan to incorporate image embeddings along with OCR and metadata-based retrieval? Utilizing the CLIP model from Candle to generate image embeddings could provide clearer context and improve…
-
Does this project support the training and inference of multi-modal retrieval models, such as Phi-3-vision? I'd like to reproduce the experiments in paper https://arxiv.org/abs/2406.11251 based on thi…
-
First of all congrats on the paper and thanks for providing the code!
In the paper at 'Zero-shot language-based multi-modal joint retrieval' you mention that integrating/combining multiple embeddin…
-
## Week of 21st oct:
- [ ] Video search PoC
- [ ] Discuss case study with Anything LLM
### Week of 14th oct:
- [x] Finetune all llamahub datasets
- [ ] Compare with Nudge-M and Nudge-N
- [x] Publish …
-
### Problem Statement
## Objective
To scale Jan and enable the addition of new features without modifying the core app codebase. This roadmap aims to strengthen Jan's core framework, making it mor…
-
## タイトル: 欠損モダリティを含むマルチモーダル感情認識のための検索拡張アプローチの活用
## リンク: https://arxiv.org/abs/2410.02804
## 概要:
マルチモーダル感情認識は、完全なマルチモーダル情報と堅牢なマルチモーダル結合表現を利用することで、高性能を実現します。しかし、現実には、全モーダリティが完全に揃っているという理想的な状況は少なく、一部の…
-
Hello,
When i follow the steps of quickstart, i am confused about how to get the final results about multi-modal dialogue retrieval on PhotoChat in PaCE. I think that the evaluation script compute sc…
-
I am currently working on a project that involves finetuning Visualized BGE. I have been able to successfully use the pretrained model, but now I would like to further finetune it for my specific use …
-
### Model description
Align Before Fuse (ALBEF) is a vision-language (VL) model that showed competitive results in numerous VL tasks such as image-text retrieval, visual question answering, visual …
-
### Feature Description
Does it support multi-modal RAG queries?
RAG is for Retrieval Augmented Generation.
For example if I drop a picture, will it find similar
### Reason
_No response_
### Val…
0xDTE updated
11 months ago