UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.33k stars 2.48k forks source link

Segmentation fault when loading two (or more) models in the same process and using them concurrently. #1867

Open avaz opened 1 year ago

avaz commented 1 year ago

Issue

When loading this library twice in the same process and use it concurrently the process crashes due SegmentationFault error. It happens when:

Notes

I believe this issue is somewhat related to: https://github.com/UKPLab/sentence-transformers/issues/1854 however different from the referred issue the issue reported here only happens when using GPU, for CPU it never crashes. Overall this seems to be a PyTorch issue however I'm reporting here as I couldn't ensure this hypothesis.

Hardware

Amazon p2.xlarge 1 GPU, 4 CPU, 61GB.

Software

Linux x86_64 GNU/Linux Python 3.9

requirements.txt

sentence-transformers==2.2.2

pip list

Package                  Version
------------------------ ----------
certifi                  2022.12.7
charset-normalizer       3.1.0
click                    8.1.3
filelock                 3.9.0
huggingface-hub          0.13.2
idna                     3.4
joblib                   1.2.0
nltk                     3.8.1
numpy                    1.24.2
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
packaging                23.0
Pillow                   9.4.0
pip                      22.0.4
python-dateutil          2.8.2
PyYAML                   6.0
regex                    2022.10.31
requests                 2.28.2
scikit-learn             1.2.2
scipy                    1.10.1
sentence-transformers    2.2.2
sentencepiece            0.1.97
setuptools               58.1.0
six                      1.16.0
threadpoolctl            3.1.0
tokenizers               0.13.2
torch                    1.13.1
torchvision              0.14.1
tqdm                     4.65.0
transformers             4.26.1
typing_extensions        4.5.0
urllib3                  1.26.15
wheel                    0.38.4

Minimal reproducible example:

import logging
from concurrent.futures import as_completed, ThreadPoolExecutor
from sentence_transformers import SentenceTransformer

logging.basicConfig(level="INFO")
executor = ThreadPoolExecutor(max_workers=11)
model1 = SentenceTransformer('all-MiniLM-L6-v2')
# happen with the same or different model
model2 = SentenceTransformer('all-MiniLM-L6-v2')

futures = {}
for i in range(100):
    if i % 2 == 0:
         s = ["something is wrong"]
        futures[executor.submit(model1.encode, s, show_progress_bar=False)] = s
    else:
        s = ["this should work but it crashes"]
        futures[executor.submit(model2.encode, s, show_progress_bar=False)] = s
for future in as_completed(futures):
    if future.exception() is not None:
        raise future.exception()
    else:
        print("Sentence:", futures[future])
        print("Embedding:", len(future.result()))
        print("")
TalhaRB commented 4 months ago

OMP_NUM_THREADS=1

set this var. this worked for me.