Open sxjscience opened 3 years ago
With the following command, I re-run the above experiment with fp16
export SQUAD_DIR=/home/ubuntu/squad
python3 -m torch.distributed.launch --nproc_per_node=4 ./examples/question-answering/run_squad.py \
--model_type albert \
--model_name_or_path albert-base-v2 \
--do_train \
--do_eval \
--version_2_with_negative \
--train_file $SQUAD_DIR/train-v2.0.json \
--predict_file $SQUAD_DIR/dev-v2.0.json \
--learning_rate 3e-5 \
--weight_decay 0.01 \
--max_grad_norm 1.0 \
--num_train_epochs 3 \
--warmup_ratio 0.1 \
--max_seq_length 512 \
--doc_stride 128 \
--output_dir ./examples/models/albert-base-v2_finetuned_squad2.0-fp16/ \
--per_gpu_eval_batch_size=24 \
--per_gpu_train_batch_size=12 \
--gradient_accumulation_steps=1 \
--overwrite_cache \
--threads 8 \
--overwrite_output_dir \
--fp16 \
The training process tooks roughly 100 minutes from 11/25/2020 17:09:34
to 11/25/2020 18:50:03
. Compared to the original 3 hours, it saves a lot of time but also loses some accuracy with final evaluation result EM/F1=75.93/79.42
.
See the whole log for the details.
@ZheyuYe thanks for the update. So with fp16 the training is about 30m faster (180m->100m) while the evaluation performance loss is ~3% (79.7/82.6 -> 75.9/79.4). The loss is somewhat higher than expected for fp16 training.
Note that it’s the Huggingface results. May be we are not calling Huggingface correctly.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Sheng Zha notifications@github.com Sent: Thursday, November 26, 2020 9:51:07 AM To: dmlc/gluon-nlp gluon-nlp@noreply.github.com Cc: Xingjian SHI xshiab@connect.ust.hk; Author author@noreply.github.com Subject: Re: [dmlc/gluon-nlp] [Performance] Speed comparison between GluonNLP and other packages (#1436)
@ZheyuYehttps://github.com/ZheyuYe thanks for the update. So with fp16 the training is about 30m faster (135m->100m) while the evaluation performance loss is ~3% (79.7/82.6 -> 75.9/79.4). The loss is somewhat higher than expected.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dmlc/gluon-nlp/issues/1436#issuecomment-734424784, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABHQH3XS5SS3QRXXU6TGKR3SR2IQXANCNFSM4TYC5KJQ.
Regarding the comparison with DeepSpeed, I recently created this markdown and we can follow the same setup to run DeepSpeed-accelerated BERT-large:
https://github.com/sxjscience/DeepSpeedExamples/tree/master/BingBertSquad
https://gist.github.com/DOUDOU0314/01ea3e74b255d302705ff4b77744a72d
Description
Similar to the efforts in our recently added benchmarking script (https://github.com/dmlc/gluon-nlp/tree/master/scripts/benchmarks), we are also interested in comparing the end-to-end training speed of training NLP models between GluonNLP and the other packages. This helps us track the performance of GluonNLP and measure the out-of-the-box speed of different toolkits. (This also helps with the goal of democratizing NLP to everyone). @ZheyuYe has helped in trying out huggingface/transformer to see its performance on a g4.12dn instance.
Huggingface command:
The whole log is available in https://gist.github.com/sxjscience/9ef5c957bb4447e8fd35ccd4d96328f0. From the log, the huggingface training and evaluation starts at
11/15/2020 14:36:09
and finishes in11/15/2020 17:36:20
, which takes roughly 3 hours.The training log of albert-base in gluonnlp is attached here: (also see the question answering examples in https://github.com/dmlc/gluon-nlp/tree/master/scripts/question_answering). It started at
2020-11-05 15:45:33,862
and finished at2020-11-05 17:57:04,681
, which is roughly 2hour and 15 minutes. Thus, the QA implementation in GluonNLP is somewhat faster than that in huggingface. However, the comparison is not totally fair since we are using different ways for preprocessing the training samples.We may try to maintain our own benchmark of end-to-end training performance and extend the comparison to other packages like DeepSpeed. I opened this issue to track the status.
@dmlc/gluon-nlp-team