Experiments

Idea: Repeat most of the unlearning experiments (continuous, batch, sequential) with harmfulness and evaluate. Based on the results decide the best hyperparameters for unlearning french and logical reasoning.

Unlearning Harmfulness (full unlearning)

Every experiment should be repeated 3 times to see if the methods are consistent or not:

[ ] Run 1
[ ] Run 2
[ ] Run 3

Unlearning
- continuous unlearning (because this does not involve pre-selecting the unlearning set, we can literally just run it until 10k)
  - [ ] 128 samples
  - [ ] 512 samples
  - [ ] 1024 samples
  - ...
  - [ ] 10240 samples (~5k steps with batch size 2, similar to sequential)
- batch unlearning (each 20 epochs)
  - [ ] 128 samples
  - [ ] 512 samples
  - [ ] 1024 samples
- sequential unlearning (each split 20 epochs)
  - [ ] 128 samples
    - 4, 16 and 64 splits
  - [ ] 512 samples
    - 4, 16 and 64 splits
  - [ ] 1024 samples
    - 4, 16 and 64 splits
Evaluation
- [ ] task-based
- [ ] beavertails + response length, quality metrics

Unlearning French

Every experiment should be repeated 3 times to see if the methods are consistent or not:

[ ] Run 1
[ ] Run 2
[ ] Run 3

Unlearning
- continuous unlearning
  - [ ] 128 samples
  - [ ] 512 samples
  - [ ] 1024 samples
  - ...
  - [ ] 10240 samples (~5k steps with batch size 2, similar to sequential)
- batch unlearning (each 20 epochs)
  - [ ] 128 samples
  - [ ] 512 samples
  - [ ] 1024 samples
- sequential unlearning (each split 20 epochs)
  - [ ] 128 samples
    - 64 splits
  - [ ] 512 samples
    - 64 splits
  - [ ] 1024 samples
    - 64 splits
Evaluation
- [ ] task-based
- [ ] FrenchBench + response length, quality metrics

Unlearning logical reasoning

Every experiment should be repeated 3 times to see if the methods are consistent or not:

[ ] Run 1
[ ] Run 2
[ ] Run 3

Unlearning
- continuous unlearning
  - [ ] 128 samples
  - [ ] 512 samples
  - [ ] 1024 samples
  - ...
  - [ ] 10240 samples (~5k steps with batch size 2, similar to sequential)
- batch unlearning (each 20 epochs)
  - [ ] 128 samples
  - [ ] 512 samples
  - [ ] 1024 samples
- sequential unlearning (each split 20 epochs)
  - [ ] 128 samples
    - 64 splits
  - [ ] 512 samples
    - 64 splits
  - [ ] 1024 samples
    - 64 splits
Evaluation
- [ ] task-based
- [ ] LogiQA + response length, quality metrics

Ablation study on batch and sequential -- what happens when we continuously unlearn for longer? Maybe the same thing as for sequential?

Given it is sufficient to run continuous once until 10k to get all the intermediate samples count values, this can be performed together with the first experiment.

Learning rate experiments (scaling appropriately to the number of steps (?)

Let's say it is sufficient to run this once.

[ ] Continuous until 10240 steps (samples count).
- [ ] 1e-7
- [ ] 1e-6
- [ ] 1e-5
- [ ] 1e-4
- [ ] 1e-3
- [ ] 1e-2
[ ] Batch 1024
- [ ] 1e-7
- [ ] 1e-6
- [ ] 1e-5
- [ ] 1e-4
- [ ] 1e-3
- [ ] 1e-2
[ ] Sequential 1024, 64 splits
- [ ] 1e-7
- [ ] 1e-6
- [ ] 1e-5
- [ ] 1e-4
- [ ] 1e-3
- [ ] 1e-2

Ablation study on loss components (I guess we can also do this for harmfulness only (?))

What happens when we do not use all 3 losses and just 2?

[ ] Continuous until 10240
- [ ] bad loss + normal loss
- [ ] bad loss + mismatch loss
- [ ] mismatch loss + normal loss
[ ] Batch 1024
- [ ] bad loss + normal loss
- [ ] bad loss + mismatch loss
- [ ] mismatch loss + normal loss
[ ] Sequential 1024, 64 splits
- [ ] bad loss + normal loss
- [ ] bad loss + mismatch loss
- [ ] mismatch loss + normal loss

Unlearning job script:

#$ -l tmem=15G
#$ -l h_rt=10:00:00 # hh:mm:ss
#$ -l gpu=true
#$ -pe gpu 1
#$ -N pku_unlearning_test_gemma2b
#$ -l hostname=dip-207-2

#$ -S /bin/bash
#$ -wd /home/zheyux01/git/SNLP_GCW
#$ -j y

#The code you want to run now goes here.

hostname
date
poetry run wandb disabled
export HF_HOME=/scratch0/$USER/hf_cache
export WANDB_DIR=/scratch0/$USER/wandb
model_save_dir=/scratch0/$USER/temp

mkdir -p $model_save_dir
mkdir -p $HF_HOME
mkdir -p $WANDB_DIR
mkdir -p /scratch0/$USER/log
mkdir -p /scratch0/$USER/cache

#model_path=/SAN/intelsys/llm/sduchnie/models/Meta-LLama-3-8B/
model_path=google/gemma-2b
dataset=PKU-Alignment/PKU-SafeRLHF

export HF_TOKEN=*********

poetry run accelerate launch llm_unlearn_ucl/unlearn_harm.py \
        --model_name $model_path \
        --batch_size 4 \
        --lr 5e-3 \
        --log_file /scratch0/$USER/log/default.log \
        --samples_count 256 \
        --sequential 1 \
        --cache_dir /scratch0/$USER/cache \
        --unlearning_dataset $dataset \
        --retaining_dataset rajpurkar/squad \
        --model_save_dir $model_save_dir

hostname
date

At minimum, use a 2B model (or a new 1B model) and run the following: MODEL: 2B Gemma, OR Phi-1 https://huggingface.co/microsoft/phi-1_5

Unlearn
- sequential unlearning (each split 20 epochs)
- [ ] 128 samples
  - 4, 16 and 64 splits (OR JUST 64)
- [ ] 512 samples
  - 4, 16 and 64 splits (OR JUST 64)
- [ ] 1024 samples
  - 4, 16 and 64 splits (OR JUST 64)
Evaluation
- [ ] task-based
- [ ] beavertails + response length, quality metrics
THen, call us xd

Seq unlearning job script, with bf16 precision and Adafactor (to be used on branch https://github.com/Adamliu1/SNLP_GCW/pull/101)

#   Most software is NOT in your PATH but under /share/apps
#
#   For further info please read http://hpc.cs.ucl.ac.uk
#   For cluster help email cluster-support@cs.ucl.ac.uk

# These are flags you must include - Two memory and one runtime.
# Runtime is either seconds or hours:min:sec

#$ -l tmem=48G
#$ -l h_rt=03:30:00

#These are optional flags but you probably want them in all jobs

#$ -S /bin/bash
#$ -l gpu=true
#$ -pe gpu 1
#$ -j y
#$ -N ulearn_halfp_gemma
#$ -l hostname=dip-207-2

#The code you want to run now goes here.

source /share/apps/source_files/python/python-3.9.5.source
source /scratch0/sduchnie/venv/bin/activate
hostname
date
BASE_PATH=/home/sduchnie/extra_storage/SNLP_GCW/llm_unlearn_ucl
DEVICE="cuda:0"
MODEL_PATH=/home/sduchnie/extra_storage/models
model_name=gemma-2b
dataset=PKU-Alignment/PKU-SafeRLHF
#dataset=sail/symbolic-instruction-tuning
retain_dataset=rajpurkar/squad
#retain_dataset=truthful_qa
EXPERIMENT_NAME=$model_name-unlearn-harm
model_save_dir=/scratch0/$USER/$EXPERIMENT_NAME

mkdir -p /scratch0/$USER/log
mkdir -p /scratch0/$USER/cache
mkdir -p /scratch0/$USER/$EXPERIMENT_NAME

echo "Unlearning model $model_name..."
wandb disabled
python3 $BASE_PATH/unlearn_harm.py \
    --model_name $MODEL_PATH/$model_name \
    --batch_size 1 \
    --lr 1e-3 \
    --log_file /scratch0/$USER/log/default.log \
    --samples_count 1024 \
    --sequential 64 \
    --num_epochs 20 \
    --max_bad_loss 1000 \
    --cache_dir /scratch0/$USER/cache \
    --unlearning_dataset $dataset \
    --retaining_dataset $retain_dataset \
    --model_save_dir $model_save_dir \
    --use_quantized True
date

Harmfulness unlearning running, all using gradient checkpointing, all for sequential 1024, 64 splits:

Initial testing with single epoch runs, to check if half precision yields diff results

1 epoch, bf16, lr=1e-3:
- Model name: /scratch0/sduchnie/gemma-2b-unlearn-harm/idx_1
- Loss converge?: No
- [ ] Eval framework?: /scratch0/sduchnie/gemma-2b-unlearn-harm-fullprecision-a1-eval1 NOTE: INCCORECT. Corrected version: /scratch0/aszablew/gemma-2b-unlearn-harm-eval1.
- [x] Harmful eval?: Flagged ratio: 0.15428571428571428, Path: /SAN/intelsys/llm/sduchnie/SNLP_GCW/eval_harmfulness/eval_results/scratch0/sduchnie/gemma-2b-unlearn-harm/idx_1-base-eval
1 epoch, full preciosion, lr=1e-3:
- Model name: /scratch0/sduchnie/gemma-2b-unlearn-harm-fullprecision/idx_1
- [ ] Eval framework?: /scratch0/sduchnie/gemma-2b-unlearn-harm-half-a1-eval1 (NOTE: MIXED UP NAMES) NOTE: INCCORECT. Corrected version: /scratch0/aszablew/gemma-2b-unlearn-harm-fullprecision.
- [x] Harmful eval?: Flagged ratio: 0.21285714285714286, Path: /SAN/intelsys/llm/sduchnie/SNLP_GCW/eval_harmfulness/eval_results/scratch0/sduchnie/gemma-2b-unlearn-harm-fullprecision/idx_1-base-eval

Further experiments to find LR for which bad_loss converges

20 epoch, bf16, lr=5e-4
- Model name: /scratch0/sduchnie/gemma-2b-unlearn-harm-half_LR5e4/idx_20
- Final loss: 1485.54150390625
- [ ] Eval framework?: /scratch0/sduchnie/gemma-2b-unlearn-harm-half_LR5e4-eval1 NOTE: INCCORECT. Corrected version: /scratch0/aszablew/gemma-2b-unlearn-harm-half_LR5e4-eval1.
- [ ] Harmful eval?: Scheduled, Job id: 4193545
20 epoch, full precision, lr=5e-4
- Model name: /scratch0/sduchnie/gemma-2b-unlearn-harm-full_LR5e4/idx_20
- Final loss: (Log will be in /scratch0/sduchnie/log)
- [ ] Eval framework?: scheduled after unlearning done, Job id: 4193542 NOTE: INCCORECT. Corrected version: /scratch0/aszablew/gemma-2b-unlearn-harm-full_LR5e4-eval1.
- [ ] Harmful eval?: Scheduled after eval framework done, Job id: 4193546
20 epoch, bf16, lr=1e-4
- Model name: /scratch0/sduchnie/gemma-2b-unlearn-harm-half_LR1e4/idx_20
- Final loss: 117.99
- [ ] Eval framework?: /scratch0/sduchnie/gemma-2b-unlearn-harm-half_LR1e4-eval1 NOTE: INCCORECT. Corrected version: /scratch0/aszablew/gemma-2b-unlearn-harm-half_LR1e4-eval1.
- [ ] Harmful eval?: Scheduled, job id: 4193547
20 epoch, full precision, lr=1e-4
- Model name: /scratch0/sduchnie/gemma-2b-unlearn-harm-full_LR1e4/idx_20
- Final loss: (Log will be in /scratch0/sduchnie/log)
- [ ] Eval framework?: scheduled after unlearning done, Job id: 4193543 NOTE: INCCORECT. Corrected version: /scratch0/aszablew/gemma-2b-unlearn-harm-full_LR1e4-eval1.
- [ ] Harmful eval?: Scheduled after eval framework done, job id: 4193549
20 epoch, bf16, lr=5e-5
- Model name: /scratch0/sduchnie/gemma-2b-unlearn-harm-half_LR5e5/idx_20
- Final loss: 58.12
- [ ] Eval framework?: Scheduled. /scratch0/aszablew/gemma-2b-unlearn-harm-half_LR5e5-eval1.
- [ ] Harmful eval?: Not scheduled
TODO: fill in currently scheduled runs, compare against baseline gemma 2b evals

NOTE

ALL TASK-BASED EVAL RUNS SCHEDULED BEFORE #104 are incorrect!

[x] Re-scheduled all of the above. In progress.
[ ] New runs finished.
[ ] Results.

@Willmish Please prepare a script for 1) lr experiments, 2) unlearning harmfulness experiments with squad.

Preferably, write it a way that allows parallelisation within a node (4 gpus -> "cuda:{0123}"). Heads up -- no dip scratch.

Example format:


RUN_NAME="seq-4-128-gemma-harmful-squad"

CUDA_VISIBLE_DEVICES=0 nohup python3 unlearn_harm.py --model_name google/gemma-2b --model_save_dir "/SAN/intelsys/llm/aszablew/snlp/SNLP_GCW/snlp-unlearned-models/models/$RUN_NAME" --log_file "/SAN/intelsys/llm/aszablew/snlp/SNLP_GCW/snlp-unlearned-models/logs/$RUN_NAME.log" --cache_dir ".cache" --seed 42 --retaining_dataset rajpurkar/squad --max_bad_loss 10000 --sequential=-1 --num_epochs=1 --batch_size=2 --seed=42 --save_every=100 --lr 2e-6 &> gemma2-continuous-squad-2e-6.log &

Adamliu1 / SNLP_GCW

[🚧WIP🚧] Experiments Plan #98

Experiments

Unlearning Harmfulness (full unlearning)

Unlearning French

Unlearning logical reasoning

Ablation study on batch and sequential -- what happens when we continuously unlearn for longer? Maybe the same thing as for sequential?

Learning rate experiments (scaling appropriately to the number of steps (?)

Ablation study on loss components (I guess we can also do this for harmfulness only (?))

NOTE