Problems with using start_rultisprocess_pool()

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

Apache License 2.0

15.33k stars 2.48k forks source link

Hello!

Do you start the Python multithreading yourself? That shouldn't be needed. There's normally just 1 queue, and each process will continuously pop from that shared queue until it's empty. These processes will then also push to 1 shared output queue. This queue is sorted afterwards to ensure that we have the same order as the inputs, but we still have just 1 output queue.

So, the usage is:

from sentence_transformers import SentenceTransformer

def main():
    model = SentenceTransformer("all-mpnet-base-v2")
    sentences = ["The weather is so nice!", "It's so sunny outside.", "He's driving to the movie theater.", "She's going to the cinema."] * 1000

    pool = model.start_multi_process_pool()
    embeddings = model.encode_multi_process(sentences, pool)
    model.stop_multi_process_pool(pool)

    print(embeddings.shape)
    # => (4000, 768)

if __name__ == "__main__":
    main()

https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html?highlight=multi_process#sentence_transformers.SentenceTransformer.encode_multi_process

Tom Aarsen

UKPLab / sentence-transformers

Problems with using start_rultisprocess_pool() #2955