Segmentation fault when loading two (or more) models in the same process and using them concurrently.

Issue

When loading this library twice in the same process and use it concurrently the process crashes due SegmentationFault error. It happens when:

loading different or the same model
showing or not the progress bar
for the same or different inputs
any number of thread pool workers

Notes

I believe this issue is somewhat related to: https://github.com/UKPLab/sentence-transformers/issues/1854 however different from the referred issue the issue reported here only happens when using GPU, for CPU it never crashes. Overall this seems to be a PyTorch issue however I'm reporting here as I couldn't ensure this hypothesis.

Hardware

Amazon p2.xlarge 1 GPU, 4 CPU, 61GB.

Software

Linux x86_64 GNU/Linux Python 3.9

requirements.txt

sentence-transformers==2.2.2

pip list

Package                  Version
------------------------ ----------
certifi                  2022.12.7
charset-normalizer       3.1.0
click                    8.1.3
filelock                 3.9.0
huggingface-hub          0.13.2
idna                     3.4
joblib                   1.2.0
nltk                     3.8.1
numpy                    1.24.2
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
packaging                23.0
Pillow                   9.4.0
pip                      22.0.4
python-dateutil          2.8.2
PyYAML                   6.0
regex                    2022.10.31
requests                 2.28.2
scikit-learn             1.2.2
scipy                    1.10.1
sentence-transformers    2.2.2
sentencepiece            0.1.97
setuptools               58.1.0
six                      1.16.0
threadpoolctl            3.1.0
tokenizers               0.13.2
torch                    1.13.1
torchvision              0.14.1
tqdm                     4.65.0
transformers             4.26.1
typing_extensions        4.5.0
urllib3                  1.26.15
wheel                    0.38.4

Minimal reproducible example:

import logging
from concurrent.futures import as_completed, ThreadPoolExecutor
from sentence_transformers import SentenceTransformer

logging.basicConfig(level="INFO")
executor = ThreadPoolExecutor(max_workers=11)
model1 = SentenceTransformer('all-MiniLM-L6-v2')
# happen with the same or different model
model2 = SentenceTransformer('all-MiniLM-L6-v2')

futures = {}
for i in range(100):
    if i % 2 == 0:
         s = ["something is wrong"]
        futures[executor.submit(model1.encode, s, show_progress_bar=False)] = s
    else:
        s = ["this should work but it crashes"]
        futures[executor.submit(model2.encode, s, show_progress_bar=False)] = s
for future in as_completed(futures):
    if future.exception() is not None:
        raise future.exception()
    else:
        print("Sentence:", futures[future])
        print("Embedding:", len(future.result()))
        print("")

UKPLab / sentence-transformers