run huggingface mlm with lightseq

bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

Other

3.2k stars 329 forks source link

run huggingface mlm with lightseq #142

Open hrdxwandg opened 3 years ago

hrdxwandg commented 3 years ago

I run pytorch version run_mlm_no_trainer.py with lightseq. But the result is a big difference。 run on my own data without lightseq https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_mlm_no_trainer.py epoch 0: perplexity: 2.847743110923294

run with lightseq epoch 0: perplexity: 3.1440967518031444

the GPU-Util is higher with lightseq. but the train time cost more. I follow the run_ner_no_trainer.py settings

Taka152 commented 3 years ago

Thanks for your feedback. Could you try more epochs? Lightseq can't be exactly the same with torch implementation, because the low-level implementation is totally different and optimized for speed.

We have tested on our in-house pretrain codebase, and after few days, the loss is close.

hrdxwandg commented 3 years ago

Thanks for your feedback. Could you try more epochs? Lightseq can't be exactly the same with torch implementation, because the low-level implementation is totally different and optimized for speed.

We have tested on our in-house pretrain codebase, and after few days, the loss is close.

thanks for your reply. If I increase epoch，It must cost more time to reach the target loss, so It Is contradictory with optimized for speed and reduce time？

Taka152 commented 3 years ago

Thanks for your feedback. Could you try more epochs? Lightseq can't be exactly the same with torch implementation, because the low-level implementation is totally different and optimized for speed. We have tested on our in-house pretrain codebase, and after few days, the loss is close.

thanks for your reply. If I increase epoch，It must cost more time to reach the target loss, so It Is contradictory with optimized for speed and reduce time？

It depends, it should be an end2end time about convergency to measure speed. Many reasons could influence perplexity in just one epoch.