multicore processing - Githubissues

markusmobius commented 4 years ago

Thank you for the great library.

I am calling the transformer from a c# backend which can run mutiple Python processes in parallel. This works fine with Spacy for example.

However, I am having trouble to understand how multicore processing encoding (CPU) works with sentence-transformers. It seems that a single instance consumes about 50% of CPU independent of the core count - but it runs about equally fast (using batches of a few hundred sentences). Does the library already scale to multiple cores? If so, what do I need to do to see some speed-up?

nreimers commented 4 years ago

Hi @markusmobius

Pytorch usually scales to multiple processes. Sometimes, if you have many cores, it can make sense to limit the number of processes:

import torch
torch.set_num_threads(4)

If torch spawns to many threads, the communication overhead gets too large and the overall speed decreases.

I also just pushed a new version that allows multi-process encoding. See this example: https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/computing_embeddings_mutli_gpu.py`

it is intended for Multi-GPU, but it can also be extended to a multi-CPU version by starting the pool like this:

pool = model.start_multi_process_pool(['cpu', 'cpu', 'cpu'])

This would spawn 3 independent CPU processes that all encode your data.

Best Nils Reimers

markusmobius commented 4 years ago

Thank you!

One little issue: on line 238 of SentenceTransformer.py it checks for GPU count (which I don't have - just a server with many CPU cores):

I changed it on my end as follows to make it run:

    #chunk_size = min(math.ceil(len(sentences) / torch.cuda.device_count() / 10), 5000)
    chunk_size = min(math.ceil(len(sentences) / len(pool["processes"]) / 10), 5000)

nreimers commented 4 years ago

Thanks for pointing that out. I changed the line accordingly.

UKPLab / sentence-transformers

multicore processing #372