XLM-R training configuration

ffaisal93 commented 8 months ago

Hi, I was trying to train the using xlm-r base on the assembled training data but it doesn't converge and giving random output (24% accuracy on eng_Latn) while I gets around 53% accuracy using mbert.

I am using huggingface's multiple choice training implementation (https://github.com/huggingface/transformers/blob/main/examples/pytorch/multiple-choice/run_swag.py) and tried learning rates (1e-5, 2e-5, 5e-5).

Weirdly, if I use just mayb 1500 examples, I get better output with 400 steps of training.

Would you mind share the training configuration using xlmr? or let me know if you have any idea what I am missing here.

python run_swag.py \
    --model_name_or_path ${MODEL_PATH}\
    --do_train \
    --do_eval \
    --train_file ${train_file} \
    --prefix "train_combined" \
    --learning_rate 2e-5 \
    --num_train_epochs 3 \
    --per_device_eval_batch_size=8 \
    --per_device_train_batch_size=8 \
    --overwrite_output \
    --output_dir ${output_dir} \
    --max_seq_length 512 \
    --cache_dir ${CACHE_DIR} \
    --overwrite_cache \
    --save_total_limit 5 \
    --save_steps 500 \
    --eval_steps 500 \
    --save_strategy="steps" \
    --evaluation_strategy="steps" \
    --load_best_model_at_end True

Thanks

yangy96 commented 4 months ago

Hi, I want to know have you solved this training problem? If so, what is the training configuration you have? Thanks!

theyorubayesian commented 2 weeks ago

+1

facebookresearch / belebele

XLM-R training configuration #8