Use multiple CPU cores during decoding

smiranda commented 3 years ago

Hello, I'm trying to use multiple cpu cores in decoding. I added the "cpu-threads: 8" to the decoder.yml, as per marian documentation.

This seems to recognize 8 cpus in loading time.

pus-mt_1 | [2020-10-16 12:08:54] [memory] Extending reserved space to 512 MB (device cpu0) opus-mt_1 | [2020-10-16 12:08:54] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:08:54] Loading model from /usr/src/app/models/en-es/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:08:54] [memory] Extending reserved space to 512 MB (device cpu0) opus-mt_1 | [2020-10-16 12:08:54] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:08:54] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:08:58] Server is listening on port 10001 opus-mt_1 | [2020-10-16 12:09:04] [memory] Extending reserved space to 512 MB (device cpu1) opus-mt_1 | [2020-10-16 12:09:04] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:04] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:08] [memory] Extending reserved space to 512 MB (device cpu2) opus-mt_1 | [2020-10-16 12:09:08] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:08] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:11] [memory] Extending reserved space to 512 MB (device cpu3) opus-mt_1 | [2020-10-16 12:09:12] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:12] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:18] [memory] Extending reserved space to 512 MB (device cpu4) opus-mt_1 | [2020-10-16 12:09:19] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:19] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:22] [memory] Extending reserved space to 512 MB (device cpu5) opus-mt_1 | [2020-10-16 12:09:22] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:22] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:25] [memory] Extending reserved space to 512 MB (device cpu6) opus-mt_1 | [2020-10-16 12:09:26] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:26] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:29] [memory] Extending reserved space to 512 MB (device cpu7) opus-mt_1 | [2020-10-16 12:09:29] Loading scorer of type transformer as feature F0 opus-mt_1 | [2020-10-16 12:09:29] Loading model from /usr/src/app/models/ar-en/opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz opus-mt_1 | [2020-10-16 12:09:32] Server is listening on port 10002

But then in execution time I see it only uses 1 CPU and takes the same time as without the cpu-threads:8 config. It also just prints:

opus-mt_1 | [2020-10-16 12:10:30] [memory] Reserving 295 MB, device cpu0

Does any one know how to use multiple CPUs in decoding ?

Thanks.

jorgtied commented 3 years ago

Is this with the tornado web app or the other OPUS-MT server? It may be related to translation of batches that is not well supported so far.

smiranda commented 3 years ago

@jorgtied Hello, I'm using the provided Dockerfile which launches CMD python3 server.py -c services.json -p 80. I think this means I'm using the tornado server. Which one is the "other" OPUS-MT server ? Is there a way to change the Dockerfile to use that one instead of the default one ? Thank you.

jorgtied commented 3 years ago

Information about the other server option is here: https://github.com/Helsinki-NLP/Opus-MT/blob/master/doc/WebSocketServer.md

smiranda commented 3 years ago

Thank you for your help.

cd Opus-MT/install
make all

This ran without errors, it seems. But the next command sudo make install said install: cannot stat 'marian/build/marian-server': No such file or directory, and in fact that file is not there, although marian, marian-decoder, etc, are there. My machine is a Ubuntu 16.04.7 LTS.

Do you know why this might happen ?

Is there a Dockerfile for this server version I could use ?

jorgtied commented 3 years ago

This is strange. Could it be that marian does notr compile a server binary anymore in the latest versions? I need to check that. Cold you try to revert to an earlier version of MarianNMT when compiling the system?

smiranda commented 3 years ago

@jorgtied thanks for your help so far ! Just FYI, I might return to this issue but I don't have more time at the moment. Tweaking around marian compilation seems too dificult for me. Maybe later !

If it suits you best, you can close the issue.

jorgtied commented 3 years ago

@smiranda I made a change in the installation makefile. Does it work now and does it compile the marian-server binary?

smiranda commented 3 years ago

@jorgtied Hello I was now able to install and run this service ! Thank you.

I still have the multi-core issue: It only uses 1 core even when processing a large text (a news item, several sentences). Is there somewhere I must configure multi-core ? I tried on the init.d file for the server in the marian command line option --cpu-threads 4. Please let me know if you have been able before to see multicore cpu activity.

I also have another observation, we're only supposed to pass 1 sentence here ? It seems so since the output is much smaller than input for a large text. In the other server, the docker http one, you can pass a large text and it does sentence splitting inside. Is this one supposed to be used differently ?

GermainZ commented 3 years ago

I see similar behavior here. The --cpu-threads option seemingly has no effect on CPU usage or translation times (I tried using 1 thread and up to 16).

A workaround is to run multiple instances of marian-server and route requests between them, which is what I ended up doing for now, but that requires a lot more work.
So for example, instead of using marian-server … --cpu-threads 16, I am running 16 instances of marian-server … --cpu-threads 1 and sending requests to these 16 instances (without caring about proper balancing for now).
This indeed results in higher CPU usage across cores, and better translation times.

Is this normal or am I missing something here? Thanks!

Helsinki-NLP / Opus-MT

Use multiple CPU cores during decoding #30