Our framework supports a range of Large Language Models, including the GPT series hosted on Azure and various open-source LLMs. To integrate these models into our experimental setup, users must define the necessary API keys and model deployment IP addresses as environment variables:
# For GPT series LLMs hosted on Azure
export OPENAI_KEY="YOUR_OPENAI_KEY"
export OPENAI_IP="YOUR_OPENAI_IP"
# For open-source LLMs
export LLM_IP="YOUR_OPENSOURCED_LLM_IP"
Before conducting human evaluation experiments, it is essential to install the required dependencies. Installation instructions are available at the following link: human-eval.
The datasets utilized in our experiments are located within the ./datasets
directory:
./dataset/chess_dataset
../dataset/mmlu_dataset
../dataset/math_dataset
../dataset/gsm_dataset
.To execute the experiments, navigate to the ./script
directory and use the provided shell script: sh run.sh {AGENT_NUM} {MODEL} {QTYPE}
, where:
{AGENT_NUM}
is the number of LLM agents to instantiate.{MODEL}
specifies the LLM to use, with support for both OpenAI-GPT series and open-source LLMs.{QTYPE}
denotes the type of questions to be processed, with options including MATH, GSM, MMLU, Chess, and HumanEval.