UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.98k stars 2.44k forks source link

prompt usage blocked by list comprehension scope issues #2804

Open 1ndiecowan opened 3 months ago

1ndiecowan commented 3 months ago

When I try to use a prompt, I get this error: can only concatenate str (not "bool") to str. Upon further investigation, I found that the issue seems to be that the local variable 'prompt' is not available inside the scope of the list comprehension that adds the prompt to the beginning of each of the sentences. This can easily be fixed by using an f string inside the comprehension instead of concatenation for some reason. This seems like a silly feature of python to me...


python3 make_embeddings.py
/Documents/Projects/data_env/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
Default prompt name is set to 'clustering'. This prompt will be applied to all `encode()` calls, except if `encode()` is called with `prompt` or `prompt_name` parameters.
Traceback (most recent call last):
  File "/make_embeddings.py", line 16, in <module>
    embeddings = model.encode(training_payers, show_progress_bar=True, normalize_embeddings=True)
  File "/Documents/Projects/data_env/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 466, in encode
    sentences = [prompt + sentence for sentence in sentences]
  File "/Documents/Projects/data_env/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 466, in <listcomp>
    sentences = [prompt + sentence for sentence in sentences]
TypeError: can only concatenate str (not "bool") to str
(data_env) iMacBook-Air clustering % python3 --version
Python 3.9.6
(data_env) MacBook-Air clustering %
tomaarsen commented 3 months ago

Hello!

Upon further investigation, I found that the issue seems to be that the local variable 'prompt' is not available inside the scope of the list comprehension that adds the prompt to the beginning of each of the sentences. This can easily be fixed by using an f string inside the comprehension instead of concatenation for some reason. This seems like a silly feature of python to me...

I'm unable to reproduce this exactly. If prompt was not defined/accessible, then we would get a different error. This is seeming like prompt is somehow a bool rather than the prompt string, e.g. you may have:

  1. passed prompt=True to model.encode
  2. set prompts={"clustering": True} as a dictionary with string keys and bool values

Or something like that. Could you try and reply with a reproducible example? E.g. like:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    prompts={
        "retrieval": "Retrieve documents that are similar to: ",
    },
    default_prompt_name="retrieval",
)

sentence = "The quick brown fox jumps over the lazy dog."
embedding = model.encode(sentence)
print(embedding.shape)
# => (384,)