Open markusmobius opened 4 years ago
Hi @markusmobius
Pytorch usually scales to multiple processes. Sometimes, if you have many cores, it can make sense to limit the number of processes:
import torch
torch.set_num_threads(4)
If torch spawns to many threads, the communication overhead gets too large and the overall speed decreases.
I also just pushed a new version that allows multi-process encoding. See this example: https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/computing_embeddings_mutli_gpu.py`
it is intended for Multi-GPU, but it can also be extended to a multi-CPU version by starting the pool like this:
pool = model.start_multi_process_pool(['cpu', 'cpu', 'cpu'])
This would spawn 3 independent CPU processes that all encode your data.
Best Nils Reimers
Thank you!
One little issue: on line 238 of SentenceTransformer.py it checks for GPU count (which I don't have - just a server with many CPU cores):
I changed it on my end as follows to make it run:
#chunk_size = min(math.ceil(len(sentences) / torch.cuda.device_count() / 10), 5000)
chunk_size = min(math.ceil(len(sentences) / len(pool["processes"]) / 10), 5000)
Thanks for pointing that out. I changed the line accordingly.
Thank you for the great library.
I am calling the transformer from a c# backend which can run mutiple Python processes in parallel. This works fine with Spacy for example.
However, I am having trouble to understand how multicore processing encoding (CPU) works with sentence-transformers. It seems that a single instance consumes about 50% of CPU independent of the core count - but it runs about equally fast (using batches of a few hundred sentences). Does the library already scale to multiple cores? If so, what do I need to do to see some speed-up?