CsabaConsulting / InspectorGadgetApp

Open Multi-Modal Personal Assistant
MIT License
4 stars 1 forks source link

RAG: Reranking to improve results #39

Open MrCsabaToth opened 2 months ago

MrCsabaToth commented 2 months ago

Right now the vector DB is working (#7) and we also made the ANN distance thresholds configurable (#35), but for proper RAG it'd be great to have re-ranking. Using Gemini this could mean many calls. Maybe we could leverage Gemma 2b model (FP16, int4, instruction tuned) locally with MediaPipe or something? That's not a re-ranker model though. And how to do that with Flutter in a platform independent way?

MrCsabaToth commented 2 months ago

MediaPipe GenAI Flutter package by Google https://pub.dev/packages/mediapipe_genai unfortunately v0.0.1

MrCsabaToth commented 2 months ago

Open reranker model performing well (besides closed source Cohere reranker / embedding): mxbai https://huggingface.co/mixedbread-ai/mxbai-rerank-large-v1, see reference post https://www.rungalileo.io/blog/mastering-rag-how-to-select-a-reranking-model

A Thorough Comparison of Cross-Encoders and LLMs for Reranking SPLADE:

  1. Cross-Encoders vs. LLMs: Effective cross-encoders, when paired with strong retrievers, have shown the ability to outperform most LLMs in reranking tasks, except for GPT-4 on some datasets. Notably, cross-encoders offer this improved performance while being more efficient, making them an attractive option for reranking tasks.
  2. LLM-based Rerankers: Zero-shot LLM-based rerankers, including those based on OpenAI and open models, exhibit competitive effectiveness, with some even matching the performance of GPT3.5 Turbo. However, the inefficiency and high cost associated with these models currently limit their practical use in retrieval systems, despite their promising performance.
MrCsabaToth commented 1 month ago

Potential reranking code on Vertex AI: https://cloud.google.com/generative-ai-app-builder/docs/ranking#rank_or_rerank_a_set_of_records_according_to_a_query

We'll potentially need a cloud function for this.