New benchmarks for the m2m/mbart models after the switch away from Fairseq to huggingface transformers in EasyNMT 2.0

UKPLab / EasyNMT

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

Apache License 2.0

1.16k stars 115 forks source link

New benchmarks for the m2m/mbart models after the switch away from Fairseq to huggingface transformers in EasyNMT 2.0 #28

Open kgravenreuth opened 3 years ago

kgravenreuth commented 3 years ago

With the switch away from Fairseq to huggingface transformers in EasyNMT 2.0, there seem to have been substantial changes, e.g. the size of the m2m_100_1.2B model has increased from 2.3 GB to 5 GB in size.

Do we need new benchmark tests for the m2m/mbart models after this change? In general, will this make translation inference slower or faster compared to before (by subjective opinion)?

nreimers commented 3 years ago

Hi @kgravenreuth We hosted the models using floating point with 16 bits (fp16). In version 2, we switch to use huggingface transformers. They host the models using fp32, hence, the models are twice as large.

Performance in terms of translation accuracy should be the same.

Will it be faster or slower: The size has here little impact, because the models can internally transferred between these two type of storing floating point numbers.

But it can be that the huggingface implementation is faster or slower then the previous fairseq implementation. Hope I can update the numbers soon

nreimers commented 3 years ago

Sadly huggingface models are much slower than the fairseq models.

I think a difference is the usage of FP32 vs. FP16 as fairseq is doing. Will check if converting the model to FP16 will speed up the computation.

hobodrifterdavid commented 2 years ago

Hello. I'm wondering if you have any further infomation.. we'd like to run the m2m-100 model and I'm considering going back to the fairseq code that was removed with v2.0.0.