Open ffaisal93 opened 8 months ago
Hi, I was trying to train the using xlm-r base on the assembled training data but it doesn't converge and giving random output (24% accuracy on eng_Latn) while I gets around 53% accuracy using mbert.
I am using huggingface's multiple choice training implementation (https://github.com/huggingface/transformers/blob/main/examples/pytorch/multiple-choice/run_swag.py) and tried learning rates (1e-5, 2e-5, 5e-5).
Weirdly, if I use just mayb 1500 examples, I get better output with 400 steps of training.
Would you mind share the training configuration using xlmr? or let me know if you have any idea what I am missing here.
python run_swag.py \ --model_name_or_path ${MODEL_PATH}\ --do_train \ --do_eval \ --train_file ${train_file} \ --prefix "train_combined" \ --learning_rate 2e-5 \ --num_train_epochs 3 \ --per_device_eval_batch_size=8 \ --per_device_train_batch_size=8 \ --overwrite_output \ --output_dir ${output_dir} \ --max_seq_length 512 \ --cache_dir ${CACHE_DIR} \ --overwrite_cache \ --save_total_limit 5 \ --save_steps 500 \ --eval_steps 500 \ --save_strategy="steps" \ --evaluation_strategy="steps" \ --load_best_model_at_end True
Thanks
Hi, I want to know have you solved this training problem? If so, what is the training configuration you have? Thanks!
+1
Hi, I was trying to train the using xlm-r base on the assembled training data but it doesn't converge and giving random output (24% accuracy on eng_Latn) while I gets around 53% accuracy using mbert.
I am using huggingface's multiple choice training implementation (https://github.com/huggingface/transformers/blob/main/examples/pytorch/multiple-choice/run_swag.py) and tried learning rates (1e-5, 2e-5, 5e-5).
Weirdly, if I use just mayb 1500 examples, I get better output with 400 steps of training.
Would you mind share the training configuration using xlmr? or let me know if you have any idea what I am missing here.
Thanks