VinAIResearch / MISCA

MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with Intent-Slot Co-Attention (EMNLP 2023 - Findings)
GNU Affero General Public License v3.0
21 stars 3 forks source link

Why doesn't training a model this way work well? #3

Closed dengg1013 closed 8 months ago

dengg1013 commented 8 months ago

I first train the base model using bert backbone,the following command is python main.py --token_level word-level --model_type bert --model_dir dir_base --task my dataset --data_dir data --attention_mode label --do_train --do_eval --num_intent_detection --use_crf, and then loads dir_base model,the following command is python main.py --token_level word-level --model_type bert --model_dir misca --task my dataset --data_dir data --attention_mode label --do_train --do_eval --num_intent_detection --use_crf \ --base_model dir_base --intent_slot_attn_type coattention, however, the result still low. image

dengg1013 commented 8 months ago

maybe the epochs is too few to train the model?

dengg1013 commented 8 months ago

RUN configuration: --token_level word-level --model_type lstm --model_dir dir_base --task all --data_dir data --attention_mode label --do_train --do_eval --num_intent_detection --use_crf **I use lstm encoder ,the result is great as following: image

But got low score using bert model(train base model first,then train misca),why?**

thinhphp commented 8 months ago

Hi, Thanks for your interest! We have updated the default hyper-parameter settings in the README. When training with BERT model, we need to scale down the learning rate (around 1e-5). So, we have updated this setting. Thanks,

tmrnvcome commented 7 months ago

hi @dengg1013 , i tried using the code u shared, but am still experiencing this same error. May I know if you can assist pleasE? Screenshot from 2024-02-08 19-06-51