Chia-Hsuan-Lee / DST-as-Prompting

Source code for Dialogue State Tracking with a Language Model using Schema-Driven Prompting
61 stars 12 forks source link

Training script for t5-base #5

Closed Soistesimmer closed 2 years ago

Soistesimmer commented 2 years ago

Hi, thank you for the nice code. It works fine with t5-small. I also follow the settings for training t5-base in your paper, but the model seems to be not properly trained. The loss when evaluation is much higher than t5-small, and the prediction results are also terrible. I think it is because the hyperparameters I set are still not correct. Can you also provide your script for training on T5-base? Thank you!

This is the script I am using: CUDA_VISIBLE_DEVICES=0,1 python examples/pytorch/summarization/run_summarization.py \ --model_name_or_path google/t5-base \ --do_train \ --do_predict \ --train_file "$DATA_DIR/train.json" \ --validation_file "$DATA_DIR/dev.json" \ --test_file "$DATA_DIR/test.json" \ --source_prefix "" \ --output_dir "$OUTPUT_DIR/t5-base-mwoz2.2" \ --per_device_train_batch_size=4 \ --per_device_eval_batch_size=4 \ --gradient_accumulation_steps 8 \ --predict_with_generate \ --learning_rate 5e-4 \ --num_train_epochs 2 \ --text_column="dialogue" \ --summary_column="state" \ --save_steps=25000

Chia-Hsuan-Lee commented 2 years ago

Hi, I think the experiments for T5-base in the paper we're using 4 GPUs.

Can you try the following on a single GPU? It works for me on the first checkpoint saved.

CUDA_VISIBLE_DEVICES=0 python examples/pytorch/summarization/run_summarization.py \ --model_name_or_path t5-base \ --do_train \ --do_predict \ --train_file "$DATA_DIR/train.json" \ --validation_file "$DATA_DIR/dev.json" \ --test_file "$DATA_DIR/test.json" \ --source_prefix "" \ --output_dir "$OUTPUT_DIR/t5-base-mwoz2.2" \ --per_device_train_batch_size=2 \ --per_device_eval_batch_size=2 \ --predict_with_generate \ --text_column="dialogue" \ --summary_column="state" \ --save_steps=50000

Soistesimmer commented 2 years ago

Thank you for your suggestion! I will have a try :)

Chia-Hsuan-Lee commented 2 years ago

Let me know if you have other questions! Closing this issue for now