The reproduction result is not good on the Overall indicator.

TracyYannn commented 1 month ago

The reproduction of the results on Overall is not very good. I ran it on V100, and here are my parameter settings and experimental results. May I ask what the reason is, or how should I reproduce it correctly? Thank you! python main.py --token_level word-level \ --model_type roberta \ --model_dir dir_base \ --task mixatis \ --data_dir data \ --attention_mode label \ --do_train \ --do_eval \ --num_train_epochs 100 \ --intent_loss_coef 0.5 \ --learning_rate 1e-5 \ --train_batch_size 32 \ --num_intent_detection \ --use_crf

python main.py --token_level word-level \ --model_type roberta \ --model_dir misca \ --task mixatis \ --data_dir data \ --attention_mode label \ --do_train \ --do_eval \ --num_train_epochs 100 \ --intent_loss_coef 0.5 \ --learning_rate 1e-5 \ --num_intent_detection \ --use_crf \ --base_model dir_base \ --intent_slot_attn_type coattention not_good_overall

BillKiller commented 3 weeks ago

I can not reproduce performance too. I hope author can provide more detail information. Same issue issue

thinhphp commented 1 week ago

We have checked and updated more detailed instruction. In general, for the model with PLM, after having the “base" model, we load it and freeze the PLM encoder (simply add .detach() after encoder output). The final stage is fine-tuning the full model, remember to perform grid search to make sure it achieves best performance. In our experiment, we use this checkpoint for MixATIS and this checkpoint for MixSNIPS as base model. In the case of MixATIS, you could try learning rate 3e-5 (freezing) and 3e-6 (after freezing). Hope it will help you. Should you have any further question, do not hesitate to contact me thinhphp.nlp@gmail.com where I more often check the inbox.

VinAIResearch / MISCA

The reproduction result is not good on the Overall indicator. #10