Set of experiments to evaluate hallucination on the Pythia suite on multiple checkpoints of one model size.
This version of the code uses the MIND framework in order to extract the activations of Pythia models for hallucination explainability on the wikipedia dataset.
To install the required packages, create a python3.10 environment (3.10.11 preferred) and run pip install -r requirements.txt
.
python ./MIND/generate_data.py --model_type <1b, 2.4b etc.> --model_family pythia --step_num <1 to 143 for the checkpoints>
to generate the data for the models of the Pythia suite. The script is meant to be run for one model size accross multiple checkpoints../MIND/auto-labeled/output/
directory.python ./MIND/utils/gather_results.py --model_family <> --model_type <>
to gather the results from the generated data. The results will be saved to ./data/diff_results.csv
and is the file that shows the wikipedia examples on which the models hallucinate in an oscillatory manner. python ./MIND/generate_hd_chunk.py
which automatically looks at the directories of the generated results (3 checkpoints for now) and extracts the activations for the hallucination examples. The activations are compiled in one json file for each data point and saved in the ./data/hidden_layer_activations.json
files.generate_data.sh
and post_processing.sh
scripts to see how the scripts are run consecutively.generate_data.sh
and post_processing.sh
scripts to run the experiments on the HPC.