JasonForJoy / Model-Editing-Hurt

Model Editing Can Hurt General Abilities of Large Language Models
29 stars 3 forks source link

Model Editing Can Hurt General Abilities of Large Language Models

This repository will release the source code for the paper:

Overview

In response to the challenge of hallucinations in the output of LLM due to false or outdated knowledge, model editing has received a lot of attention due to its low resource consumption. Previous studies have proposed many effective methods and achieved good results in editing performance. However, these model editing methods often overlooking potential sideeffects on the general abilities of LLMs.

This paper analyze side effects by evaluating four popular editing methods on four LLMs across eight representative task categories.

Datasets

The datasets are included in data/. There are three folders:

The whole data directory is as follows:

data/
    |__ edited-data 
        |__ zsre.json
    |__ task-data
        |__ test-dialogue
        |__ test-ClosedDomainQA.jsonl
        |__ test-NER.txt
        |__ test-NLI.tsv
        |__ test-OpenDomainQA.jsonl
        |__ test-reasoning.jsonl
        |__ test-SentimentAnalysis.tsv
        |__ test-summarization.json
    |__ training-data
        |__ zsre_mend_train.json
        |__ zsre_mend_eval.json

You can download these datasets here. [Google Drive].

Prepare the environment

Requirements

Note: Please use Python 3.9+ To get started, simply install conda and run:

git clone https://github.com/JasonForJoy/Model-Editing-Hurt.git
conda create -n EditHurt python=3.9.7
...
pip install -r requirements.txt

Models

All models are putted in hugging_cache/<model_name> (model_name=gpt2-xl, gpt-j-6B, or llama-7b).

These could be changed in hparams/<method_name>/.

Evaluation

Eight different downstream task evaluation metrics are as follows

GPT-2 XL(1.5B), LLaMA-1(7B), LLaMA-2(7B), LLaMA-2(13B) are used for editing.

Running the evaluation

If you want to evaluate the performance of the pre-edit model on various downstream tasks (e.g. evaluating task NER), run:

python test-task.py task
python test-task.py NER

task: The name of the task you want to evaluate,you can choose from: ClosedDomainQA, dialogue, NER, NLI, OpenDomainQA, reasoning, SentimentAnalysis, summarization.

If you want to evaluate the performance of the edited model on various downstream tasks (e.g. evaluating task NER with mode Instance-Sequential and method ROME), run:

python test-task-after.py task mode method sample_begin sample_end sample_step
python test-task-after.py NER Instance-Sequential ROME 200 204 1

mode: The mode of editing you want to use,you can choose from: Batch-Single, Instance-Sequential, Batch-Sequential.

method:The editing method you want to use,you can choose from: ROME, MEMIT, KN, MEND.

sample_begin:The number at the beginning of the sample you selected in the dataset.

sample_end:The number at the end of the sample you selected in the dataset.

sample_step: One sample is selected every sample_step sample.

If you choose Batch-Sequential as mode (e.g. evaluating task NER with mode Batch-Sequential and method MEMIT), run:

python test-task-after.py task mode method sample_begin sample_end sample_step batch_size
python test-task-after.py NER Batch-Sequential MEMIT 400 404 1 2

batch_size: The size of the batch.

If mode Batch-Single or mode Instance-Sequential is selected: results from each run are stored at test-result/test-<task>/result-<task>-<mode>-<method>-<sample_total>.

If mode Batch-Sequential is selected: results from each run are stored at test-result/test-<task>/result-<task>-<mode>-<method>-<batch_size>*<edit_time>.

Trainer

To use the MEND method, you should firstly train a hypernetwork using the data in data/training-data/, and these weights would be saved in result/models/MEND. Then use the same steps above to edit models. Run:

python train_MEND.py

Citation

If you use this code and dataset, please cite our paper:

@article{DBLP:journals/corr/abs-2401-04700,
  author       = {Jia{-}Chen Gu and
                  Hao{-}Xiang Xu and
                  Jun{-}Yu Ma and
                  Pan Lu and
                  Zhen{-}Hua Ling and
                  Kai{-}Wei Chang and
                  Nanyun Peng},
  title        = {Model Editing Can Hurt General Abilities of Large Language Models},
  journal      = {CoRR},
  year         = {2024},
  url          = {https://doi.org/10.48550/arXiv.2401.04700},
}

Related Projects

We express sincere gratitude to EasyEdit and ROME, as we have utilized portions of their source code in our project.