Open ShaneOH opened 1 year ago
Not quite sure what is happening here. I believe it has something to do with your device, namely the MacBook. I remember these kinds of issues popping up in BERTopic when using newer MacBooks. I believe a fix for this might be setting device='mps'
when using a SentenceTransformer model.
Hey @MaartenGr -- thanks for the response. So I tried this inside Docker:
from keybert import KeyBERT
from sentence_transformers import SentenceTransformer
text_input = 'test me'
model = SentenceTransformer(
"all-MiniLM-L6-v2",
device="mps"
)
kw_model = KeyBERT(model)
keywords = kw_model.extract_keywords(text_input, keyphrase_ngram_range=(1, 4))
Which results in this:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/test.py", line 14, in <module>
keywords = kw_model.extract_keywords(text_input, keyphrase_ngram_range=(1, 4))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/keybert/_model.py", line 176, in extract_keywords
doc_embeddings = self.model.embed(docs)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/keybert/backend/_sentencetransformers.py", line 62, in embed
embeddings = self.embedding_model.encode(documents, show_progress_bar=verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 153, in encode
self.to(device)
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: PyTorch is not linked with support for mps devices
Could the issue have something to do with PyTorch? I had seen a similar issue here https://github.com/MaartenGr/KeyBERT/issues/146, but in that case he was not able to instantiate the model it seemed like. I can instantiate the model fine, but calling extract_keywords
causes the seg fault.
I'm not sure if I need to run any installs other than keybert<1.0
in my requirements.txt
? Then my Dockerfile only has python:3.11
+ install from requirements.txt
Meanwhile again, all this test code works perfectly fine outside Docker, on my same MacBook from command line, without error. It's only when I try to run it from inside my Docker container. So it seems like perhaps the Docker container could be missing something, but I don't see anything in the KeyBERT docs about needing to install anything other than keybert
to run the minimal example I'm trying? But maybe it's something that already happens to be installed on my machine but is not included in the Docker image?
I believe this is related to pytorch and the python version that you have installed. Could you check whether the versions of pytorch and python between your local and Docker environment are exactly the same?
@MaartenGr looks like there's a slight difference
Outside Docker:
(venv) shane@Shanes-PC % python --version
Python 3.11.4
(venv) shane@Shanes-PC % python -c "import torch; print(torch.__version__)"
2.0.1
Inside docker:
root@6e4452c64315:/app# python --version
Python 3.11.6
root@6e4452c64315:/app# python -c "import torch; print(torch.__version__)"
2.1.0
So note my requirements.txt
did not have torch
, only keybert<1.0
.
So if I add torch==2.0.1
in my requirements.txt
and rebuild the image... it works!
So to recap: Downgrading torch from 2.1.0 to 2.0.1 in my Docker container solved this issue. I double-checked by upgrading back to 2.1.0 and confirm it breaks again with seg fault, downgrade again and it works again.
Interestingly, if I upgrade torch from 2.0.1 to 2.1.0 (via pip install torch
) in my venv outside of docker (so on my machine) -- it still works.
Really not sure what's going on there at the core, but if it works it works so I'll just keep pytorch
pinned to 2.0.1
for now 😅
Feel free to close this issue as solved, not sure who might want to be looking into whatever's going on with Docker + pytorch in the bigger picture.
Thanks for your responsiveness on this too -- I really appreciate it!
Glad to hear that you solved the issue! Most likely, it is a result of the torch version (also whether it has cuda or not) that shows the differences. I'll close this for now but if somebody else runs into this issue, I'll make sure to re-open it.
Hi, when trying to run this on my machine (MacBook Pro M2), everything works fine. However, when trying to run inside Docker I get a seg fault when calling
extract_keywords
:So instantiating the model actually works fine, but the
extract_keywords
breaks. Here's some debug output when I run the Python interpreter viapython -q -X faulthandler
:In my Dockerfile I have
FROM python:3.11
which is the same version as my local machine (which again is working fine).When I run container stats, I can see my
MEM LIMIT
is around 8gb, and when I run this little test script, memory only rises to around 200MB -- although I see the CPU % spike really high, 100-300%, so I'm not sure if that's what's going on.Any idea how to continue debugging this?