HornHehhf/SocREval - Githubissues

SocREval

Use virtual environment tools (e.g miniconda) to install packages and run experiments\ python==3.7.10\ pip install -r requirements.txt

The code is organized as follows:

Data processing
- roscoe_data_processing.py (processing human judged datasets in ROSCOE for our experiments)
GPT-4 for reference-free reasoning evaluation
- gpt4_evaluation_gsm8k.py (GPT-4 on GSM8K)
- gpt4_evaluation_esnli.py (GPT-4 on e-SNLI)
- gpt4_evaluation_drop.py (GPT-4 on DROP)
- gpt4_evaluation_cosmos.py (GPT-4 on Cosmos QA)
SocREval for reference-free reasoning evaluation
- SocREval_gsm8k.py (SocREval on GSM8K)
- SocREval_esnli.py (SocREval on e-SNLI)
- SocREval_drop.py (SocREval on DROP)
- SocREval_cosmos.py (SocREval on Cosmos QA)

Change the /path/to/working/dir to the path to your working directory.

You need to export your own OpenAI API key before running experiments with OpenAI API, i.e., export OPENAI_API_KEY=$YOUR_OPENAI_API_KEY

Following the instructions in ROSCOE code repository:

Run download_annotated.sh to obtain the human judged datasets, including "roscoe/raw", "roscoe/generated", and "roscoe/annotated", and put them all under /path/to/working/dir/
Run restore_annotated.py to restore the annotated files and put them under /path/to/working/dir/roscoe/restore_annotated

Processing the data for our experiments:

python roscoe_data_processing.py

To reproduce the experiments for GPT-4 evaluation:

python gpt4_evaluation_gsm8k.py
python gpt4_evaluation_esnli.py
python gpt4_evaluation_drop.py
python gpt4_evaluation_cosmos.py

To reproduce the experiments for SocREval:

python SocREval_gsm8k.py
python SocREval_esnli.py
python SocREval_drop.py
python SocREval_cosmos.py