chigkim / Ollama-MMLU-Pro

Apache License 2.0
60 stars 13 forks source link

Ollama-MMLU-Pro

This is a modified version of run_gpt4o.py from TIGER-AI-Lab/MMLU-Pro, and it lets you run MMLU-Pro benchmark via the OpenAI Chat Completion API. It's tested on Ollama and Llama.cpp, but it should also work with LMStudio, Koboldcpp, Oobabooga with openai extension, etc.

Open In Colab

I kept the testing and scoring method exactly the same as the original script, adding only a few features to simplify running the test and displaying the results. To see the exact changes, compare between mmlu-pro branch against main with git diff:

git diff mmlu-pro..main -- run_openai.py

Usage

Change the config.toml according to your setup.

pip install -r requirements.txt
python run_openai.py

You can also override settings in configuration file with command line flags like --model, ----category, etc. For example, if you specify --model phi3, all the settings from configuration file will be loaded except model. See python run_openai.py -h for more info.

Additional Notes