AlibabaResearch / DAMO-ConvAI

DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
MIT License
1.15k stars 185 forks source link

Hyperparameters to reproduce reported scores for SPACE-3 #27

Closed tma15 closed 1 year ago

tma15 commented 1 year ago

Thanks for the great work. I'm interested in intent prediction tasks using SPACE-3 and want to reproduce the reported scores of BANKING77, HWU64, CLINC150.

I had confirmed that the fine-tuned model, which is distributed on this link, obtained reported accuracy on BANKING77. However, when I fine-tuned pre-trained models by myself, the accuracy did not match with the reported ones.

The following is scripts/banking/train.sh on my environment. I had only changed PROJECT_ROOT and SAVE_ROOT from the original script.

#!/bin/bash
set -ux

# CUDA environment settings.
export CUDA_VISIBLE_DEVICES=1

# Parameters.
LEARNING_METHOD=super
MODEL=IntentUnifiedTransformer
TRIGGER_DATA=banking
TRIGGER_ROLE=user
PROJECT_ROOT=modelscope/damo/nlp_space_pretrained-dialog-model
VOCAB_PATH=${PROJECT_ROOT}/model/Bert/vocab.txt
DATA_DIR=${PROJECT_ROOT}/data/pre_train
LOAD_MODEL_NAME=SPACE-Intent
INIT_CHECKPOINT=${PROJECT_ROOT}/model/${LOAD_MODEL_NAME}
EXAMPLE=false
WITH_QUERY_BOW=false
WITH_RESP_BOW=false
WITH_CONTRASTIVE=false
WITH_RDROP=true
WITH_POOL=false
WITH_MLM=true
DYNAMIC_SCORE=true
GENERATION=false
POLICY=false
TOKENIZER_TYPE=Bert
DROPOUT_RATIO=0.25
TEMPERATURE=0.07
MLM_RATIO=0.1
KL_RATIO=5.0
LR=1e-4
PROMPT_NUM_FOR_POLICY=5
PROMPT_NUM_FOR_UNDERSTAND=5
BATCH_SIZE_LABEL=64
GRAD_ACCUM_STEPS=2
BATCH_SIZE_NOLABEL=0
NUM_PROCESS=1
NUM_INTENT=77
NUM_EPOCH=60
NUM_GPU=1
SEED=11
SAVE_ROOT=reproduce
SAVE_DIR=${SAVE_ROOT}/outputs/${TRIGGER_DATA}/94-94

# Data preprocess.
python -u preprocess.py \
  --data_dir=${DATA_DIR} \
  --with_mlm=${WITH_MLM} \
  --vocab_path=${VOCAB_PATH} \
  --num_process=${NUM_PROCESS} \
  --trigger_data=${TRIGGER_DATA} \
  --trigger_role=${TRIGGER_ROLE} \
  --dynamic_score=${DYNAMIC_SCORE} \
  --tokenizer_type=${TOKENIZER_TYPE} \
  --prompt_num_for_policy=${PROMPT_NUM_FOR_POLICY} \
  --prompt_num_for_understand=${PROMPT_NUM_FOR_UNDERSTAND}

# Main run.
python -u run_intent.py \
  --do_train=true \
  --do_infer=true \
  --do_test=true \
  --model=${MODEL} \
  --example=${EXAMPLE} \
  --policy=${POLICY} \
  --generation=${GENERATION} \
  --data_dir=${DATA_DIR} \
  --vocab_path=${VOCAB_PATH} \
  --num_process=${NUM_PROCESS} \
  --trigger_data=${TRIGGER_DATA} \
  --trigger_role=${TRIGGER_ROLE} \
  --dynamic_score=${DYNAMIC_SCORE} \
  --tokenizer_type=${TOKENIZER_TYPE} \
  --prompt_num_for_policy=${PROMPT_NUM_FOR_POLICY} \
  --prompt_num_for_understand=${PROMPT_NUM_FOR_UNDERSTAND} \
  --with_query_bow=${WITH_QUERY_BOW} \
  --with_resp_bow=${WITH_RESP_BOW} \
  --batch_size_label=${BATCH_SIZE_LABEL} \
  --gradient_accumulation_steps=${GRAD_ACCUM_STEPS} \
  --batch_size_nolabel=${BATCH_SIZE_NOLABEL} \
  --save_dir=${SAVE_DIR} \
  --init_checkpoint=${INIT_CHECKPOINT} \
  --learning_method=${LEARNING_METHOD} \
  --temperature=${TEMPERATURE} \
  --with_contrastive=${WITH_CONTRASTIVE} \
  --with_rdrop=${WITH_RDROP} \
  --with_pool=${WITH_POOL} \
  --with_mlm=${WITH_MLM} \
  --mlm_ratio=${MLM_RATIO} \
  --kl_ratio=${KL_RATIO} \
  --dropout=${DROPOUT_RATIO} \
  --embed_dropout=${DROPOUT_RATIO} \
  --attn_dropout=${DROPOUT_RATIO} \
  --ff_dropout=${DROPOUT_RATIO} \
  --num_intent=${NUM_INTENT} \
  --num_epoch=${NUM_EPOCH} \
  --gpu=${NUM_GPU} \
  --seed=${SEED} \
  --lr=${LR} \
  --log_steps=20 \
  --valid_steps=0 \
  --num_type_embeddings=2 \
  --save_checkpoint=true \
  --token_loss=true \
  --max_len=256

Do you have any ideas for reproducing reported scores?

tma15 commented 1 year ago

Results by using the model fine-tune by me

[Infer][37]   Accuracy: 0.9405844155844156   original acc: 0.9402597402597402   (3062) confident acc: 0.9448073154800783   (18) unconfident acc: 0.16666666666666666   new unconfident acc: 0.2222222222222222   final acc: 0.9405844155844156   TIME-9.28

Results by using the fine-tuned model:

[Infer][0]   Accuracy: 0.9493506493506494   original acc: 0.9493506493506494   (3040) confident acc: 0.9549342105263158   (40) unconfident acc: 0.525   new unconfident acc: 0.4   final acc: 0.9477272727272728   TIME-9.40
huybery commented 1 year ago

please @HwwAncient follow it up

mawentao277 commented 1 year ago

Hi, you can refer to the heperparameters in the two files: https://www.modelscope.cn/models/damo/nlp_space_pretrained-dialog-model/summary https://www.modelscope.cn/models/damo/nlp_space_pretrained-dialog-model/file/view/master/intent_train_config.json