What could cause widely varying inference time when using pre-trained opus-mt-en-fr model with python transformers library?

shandou commented 1 year ago

I have been testing pre-trained Opus-MT models ported to transformers library for python implementation. Specifically, I am using opus-mt-en-fr for English to French translation. And the tokenizer and translation model is loaded via MarianTokenizer and MarianMTModels--similar to code examples shown here on huggingface. Strangely, for the same pre-trained model translating the same English input on an identical machine, I have observed anywhere between 80+ ms and (whopping) 4 s per translation (example input = "kiwi strawberry").

Wonder if anyone has observed similar behaviours, and what could cause such a wide variation? Thank you very much!

jorgtied commented 1 year ago

Maybe asking people at huggingface and the transformers git repo would help?

blademoon commented 1 year ago

Good afternoon. Hypothetically, maybe the CPU or GPU load affected the performance of the model? Have you tried to monitor the load on the hardware component while performing measurements?

Helsinki-NLP / OPUS-MT-train

What could cause widely varying inference time when using pre-trained opus-mt-en-fr model with python transformers library? #80