Reproducing ablation study results

Hi, I am trying to reproduce the results presented in the paper. However, I don't get anywhere near these (66 F1 vs. 68 F1 on meld). I would rather not use the domain module, and therefore I want to get the results as presented in the ablation study.

I noticed that there is a lot of commented code in main_new.py and train_and_inference_Uni.sh, resulting in some unused arguments. Especially the emotion_prediction flag in main_new.py which does a case splitting during training, resulting in the same outcomes because of commented code (ref). In the paper, you mentioned α as a scalar for the influence of the emotion prediction task, but I cannot find it anywhere in the code. There is a β of 0.1, but it remains unused.

As a result, I'm unsure whether what I'm doing is correct or if I'm overlooking something.

My training routine is the following:

Pretrain LLaMA2-7b-chat-hf with the speaker identification task.
Normal training including the emotion prediction as an auxiliary task

I also found that there is no improvement on the test set after ca. 3 epochs.

This is my shell script where I copied everything I needed to keep a better overview.

MODEL_NAME='LLaMA2'
Experiments_setting='lora'
dataset='meld'
historical_window=20

accumulations=8
graphics_card=2
BS=$((accumulations * graphics_card))

speaker_task='True'
domain_base='False'
emotion_prediction='True'

data_percent=1.0
MAX_LENGTH=1024

DO_EVAL=True
DO_TRAIN=True
LORA=True
LR=2e-4
CHECKPOINT_DIR=None

SPEAKER_DATA_PATH=$(python ./code/data_process.py \
    --dataset ${dataset} \
    --historical_window ${historical_window} \
    --speaker_task True \
    --emotion_prediction False)

EMOTION_DATA_PATH=$(python ./code/data_process.py \
    --dataset ${dataset} \
    --historical_window ${historical_window} \
    --speaker_task None \
    --domain_base ${domain_base} \
    --emotion_prediction ${emotion_prediction})

DATA_SPEAKER_PATH=$(echo "$SPEAKER_DATA_PATH" | cut -d ',' -f 1)
DATA_WINDOW_PATH=$(echo "$EMOTION_DATA_PATH" | cut -d ',' -f 2)
Speaker_Model_output_dir=./experiments/${MODEL_NAME}/${Experiments_setting}/${dataset}/${speaker_task}_one
Content_Model_output_dir=./experiments/${MODEL_NAME}/${Experiments_setting}/${dataset}/${speaker_task}_two

  echo "*********************************************"
  echo "Start to train on Speaker Identification task!"
  echo "*********************************************"
  deepspeed --master_port=29500 ./code/main_new.py \
      --dataset ${dataset} \
      --model_name_or_path ${MODEL_PATH} \
      --data_dir ${DATA_SPEAKER_PATH} \
      --output_dir ${Speaker_Model_output_dir} \
      --max_length ${MAX_LENGTH} \
      --batch_size ${BS} \
      --deepspeed_config ./code/data_utils/deepspeed_config.json \
      --gradient_accumulation_steps ${accumulations} \
      --eval_batch_size 8 \
      --num_train_epochs 3 \
      --save_steps 10000 \
      --lora ${LORA}\
      --learning_rate ${LR} \
      --do_train ${DO_TRAIN} \
      --do_eval ${DO_EVAL} \
      --statistic_mode False

  echo "*******************************************************************"
  echo "Speaker Identification task has been achieved successfully!"
  echo "*******************************************************************"

  echo "*********************************************"
  echo "Start to train on Emotion Recognition task!"
  echo "*********************************************"
  deepspeed --master_port=29500 ./code/main_new.py \
        --dataset ${dataset} \
        --model_name_or_path ${MODEL_PATH} \
        --data_dir ${DATA_WINDOW_PATH} \
        --output_dir ${Content_Model_output_dir} \
        --max_length ${MAX_LENGTH} \
        --batch_size ${BS} \
        --deepspeed_config ./code/data_utils/deepspeed_config.json \
        --gradient_accumulation_steps ${accumulations} \
        --eval_batch_size 8 \
        --num_train_epochs 15 \
        --save_steps 100000 \
        --lora ${LORA}\
        --learning_rate ${LR} \
        --do_eval ${DO_EVAL} \
        --do_train ${DO_TRAIN} \
        --statistic_mode True \
        --beta 0.1 \
        --theta 1.0 \
        --emotion_prediction ${emotion_prediction} \
        --checkpoint_dir ${Speaker_Model_output_dir}

I only changed one thing in main_new.py: I use bfloat16 for training rather than fp16, since LLaMA2 seems to have problems with fp16 conversion resulting in NaN for the loss (see issue) GPUs: 2 x RTX A6000 (48Gb)

Did I misunderstand something in my setup? I would greatly appreciate your help!

LIN-SHANG / InstructERC

Reproducing ablation study results #10