THU-KEG / DICE

DICE: Detecting In-distribution Data Contamination with LLM's Internal State
https://arxiv.org/pdf/2406.04197
MIT License
5 stars 0 forks source link
benchmark data-contamination fine-tuning-llm gsm8k llm sft

DICE

DICE: Detecting In-distribution Data Contamination with LLM's Internal State Data and Code for the paper.

Installation

git clone https://github.com/THU-KEG/DICE.git
cd DICE

Reproducing Results

Step 1: Fine-tune the contaminated model

Our code to fine-tune contaminated model is stored in the OOD_test/scripts folder.

paraphrase benchmark

python scripts/rewrite.py --dataset_name gsm8k

The paraphrased dataset we used in the paper is available in the OOD_test/scripts/data folder.

fine-tune

cd OOD_test
CUDA_VISIBLE_DEVICES=0 python scripts/contaminated_finetune.py \
--model_name microsoft/phi-2 \
--generative_batch_size 32 \
--dataset_name gsm8k \
--train_dataset_name gsm8k \
--epochs 1

fine-tune scripts

You can also use the following script to directly reproduce the contaminated model of the main experiment in our paper.

CUDA_VISIBLE_DEVICES=0 bash scripts/contaminated_finetune.sh

Step 2: OOD Performance of contaminated models

Similar to the fine-tuning process above, you can use the following scripts to test OOD performance.

The parameter settings are the same as above. The only thing to note is that --dataset_name is the OOD dataset to be tested, and --train_dataset_name is the contaminated dataset.

cd OOD_test
CUDA_VISIBLE_DEVICES=0 python OOD_generate_inf.py \
--model_name microsoft/phi-2 \
--generative_batch_size 32 \
--dataset_name math \
--train_dataset_name gsm8k \
--epochs 1

Step 3: Locate contaminated layer

Code of this part is stored in the Locate folder.

CUDA_VISIBLE_DEVICES=0 python DICE_locate.py \
--edited_model=meta-llama/Llama-2-7b-hf \
--hparams_dir=../hparams/DICE/llama-7b 

Step 4: Train and test DICE detector

Code of this part is stored in the contamination_classifier folder.

make data (hidden states of contaminated layer)

You can use the following script to get the data.

cd contamination_classifier
CUDA_VISIBLE_DEVICES=0 python data_maker.py \
--edited_model=meta-llama/Llama-2-7b-hf \
--hparams_dir=../hparams/DICE/llama-7b \
--test_dataset=GSM8K_seen \
--is_contaminated=True \
--model_type=vanilla \
--contaminated_type=open

You can also use the following script to directly reproduce test data of the main experiment in our paper.

CUDA_VISIBLE_DEVICES=0 bash scripts/make_test_data.sh

train and test DICE detector

Use train_test.py to train and test a DICE.

You can simply use the following script to directly reproduce test results of the main experiment in our paper.

CUDA_VISIBLE_DEVICES=0 bash scripts/Test_DICE.sh
other experiment

The contamination_classifier folder contains the code for the main experiments in the paper, including the performance_vs_score subfolder that stores the code for the experiment to test the relationship between contaminated probability and model performance, draw_OOD.py is the code for drawing the detection distribution of the OOD dataset, and so on.

Acknowledgements

Our implementation is based on the repository of the paper "Evading Data Contamination Detection for Language Models is (too) Easy" by Jasper Dekoninck, Mark Niklas Müller, Maximilian Baader, Marc Fischer, and Martin Vechev. The original repository can be found here. Their LICENSE file can be found in the OOD_test folder as well. We have made some modifications to the code to adapt it to our needs.

We wish to express our appreciation to the pioneers in the field of evasive data contamination. Our work was developed as a way to address the attack presented in the evasive data contamination.