Closed Willmish closed 3 months ago
Update documentation to include what is the random guess performance for each benchmark (e.g. if a benchmark is a MCQ out of 3 choices, LLM doing ~33% is random guessing so no good)
Update documentation to include what is the random guess performance for each benchmark (e.g. if a benchmark is a MCQ out of 3 choices, LLM doing ~33% is random guessing so no good)