Helsinki-NLP / Opus-MT

Open neural machine translation models and web services
MIT License
604 stars 71 forks source link

What is the best way to deploy models in terms of translation speed? #70

Closed jamie0725 closed 2 months ago

jamie0725 commented 1 year ago

Hi,

Firstly, thanks for making your translation models publicly available. It is really helpful for the industry.

I have a question though, related to this question, if I am going to translate a large amount of text, what is the best way to use your models? Currently I am using the transformers library, but the speed is pretty slow even on gpu, which is not satisfying enough.

jamie0725 commented 1 year ago

In terms of speed, currently to translate 30k documents of about 300 words each, it takes 10+ hours on a single gpu. Is this expected?

opme commented 1 year ago

There was a good answer on that stackoverflow: Helsinki-NLP models were originally trained in Marian and then converted to Huggingface Transformers. Marian is a specialized tool for MT and is very fast. If you do not need the internals of the models and only need the translation, it should be a better choice.

I have been able to get the transformers working and it is taking 2-3 seconds per paragraph with a new 2022 laptop with 64 gb ram and A2000 gpu.

I am trying the dockerized version of Opus-MT but when I run it is so far giving me incomplete or garbage response.

docker build -f Dockerfile.gpu . -t opus-mt-gpu
nvidia-docker run -p 8888:8888 opus-mt-gpu:latest
~/git/Opus-MT$ echo "I am a dog" | ./opusMT-client.py -H 172.17.0.2 -P 10001 -s en -t es
Soy un

I am trying to translate company profiles for 80k stock symbols into european languages. just venting. I don't want to start the bulk translations until I can get it running at <1 second.

jorgtied commented 1 year ago

For batch translation it would be better to run directly through the marian-decoder and not through the server/client setup. Also note that the opusMT server/client implementation does not do batching and, therefore, does not really use the full power of a GPU.