castorini / ragnarok

Retrieval-Augmented Generation battle!
Apache License 2.0
42 stars 3 forks source link

Feat/elo #5

Closed xpbowler closed 3 months ago

xpbowler commented 3 months ago

MR: Elo Leaderboard feature

Uses an implementation of the Elo rating system: https://en.wikipedia.org/wiki/Elo_rating_system

We have 3 leaderboards:

  1. LLM: generation LLM's (e.g.,GPT-4o, command-r)
  2. Retrieve: the Retrieve pipeline - retriever model + reranker model (e.g., BM25+rank_zephyr)
  3. RAG: the whole RAG pipeline - retriever model + reranker model + LLM (e.g., BM25+rank_zephyr+GPT-4o)

A battle is eligible for updating a leaderboard following the conditions stated below: