Why doesn't training a model this way work well?

VinAIResearch / MISCA

MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with Intent-Slot Co-Attention (EMNLP 2023 - Findings)

GNU Affero General Public License v3.0

21 stars 3 forks source link

Why doesn't training a model this way work well? #3

Closed dengg1013 closed 8 months ago

dengg1013 commented 8 months ago

I first train the base model using bert backbone,the following command is python main.py --token_level word-level --model_type bert --model_dir dir_base --task my dataset --data_dir data --attention_mode label --do_train --do_eval --num_intent_detection --use_crf, and then loads dir_base model,the following command is python main.py --token_level word-level --model_type bert --model_dir misca --task my dataset --data_dir data --attention_mode label --do_train --do_eval --num_intent_detection --use_crf \ --base_model dir_base --intent_slot_attn_type coattention, however, the result still low.

dengg1013 commented 8 months ago

maybe the epochs is too few to train the model？

dengg1013 commented 8 months ago

RUN configuration： --token_level word-level --model_type lstm --model_dir dir_base --task all --data_dir data --attention_mode label --do_train --do_eval --num_intent_detection --use_crf **I use lstm encoder ,the result is great as following:

But got low score using bert model（train base model first,then train misca）,why?**

thinhphp commented 8 months ago

Hi, Thanks for your interest! We have updated the default hyper-parameter settings in the README. When training with BERT model, we need to scale down the learning rate (around 1e-5). So, we have updated this setting. Thanks,

tmrnvcome commented 7 months ago

hi @dengg1013 , i tried using the code u shared, but am still experiencing this same error. May I know if you can assist pleasE?