UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.73k stars 2.43k forks source link

No timing improvement in using sentence transformers in multiprocess environment #1585

Open swapnilk2 opened 2 years ago

swapnilk2 commented 2 years ago

I want to serve multiple encoding requests in parallel. For that, I am creating multiple processes. But I'm not seeing any timing improvement, rather, it looks like processes are blocking each other and resulting in more time for encoding.

Following is my test code and timings:

import os
import time
from multiprocessing import Process
from sentence_transformers import SentenceTransformer

def proc_func():
    t_vec = time.time()
    text_list = [
        "SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings.",
        "The initial work is described in our paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.",
        "You can use this framework to compute sentence / text embeddings for more than 100 languages.",
        "These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning.",
        "This can be useful for semantic textual similar, semantic search, or paraphrase mining.",
        "The framework is based on PyTorch and Transformers and offers a large collection of pre-trained models tuned for various tasks.",
        "Our models are evaluated extensively and achieve state-of-the-art performance on various tasks",
        "Further, the code is tuned to provide the highest possible speed",
        "Have a look at Pre-Trained Models for an overview of available models and the respective performance on different tasks.",
        "Dont hesitate to send us an e-mail or report an issue, if something is broken (and it shouldnt be) or if you have further questions.",
    ]
    print("Loading BERT model")
    model = SentenceTransformer('msmarco-distilbert-base-v4')

    for text in text_list:
        vector = model.encode([text], show_progress_bar=False)[0]
    t_vec = round((time.time() - t_vec) * 1000, 3)

    print(f"Processing time: {t_vec}")

proc_count = 3
for proc in range(proc_count):
    p = Process(target=proc_func)
    p.start()

print("Done!")

Timings Process count 1: 2324.172 ms Process count 2: 2830.533 ms, 2948.689 ms Process count 3: 4047.321 ms, 4112.603 ms, 4401.65 ms

CPU details Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC 7571 Stepping: 2 CPU MHz: 2199.992

Am I missing some configuration here? How can I serve multiple encoding requests without any overheads?

ArbaazAli1 commented 1 year ago

@nreimers I am following the same issue Please share any suggestions that you may have Thanks