Open rdpulgar opened 4 years ago
I'm having the same exact issue! Please advise.
@rdpulgar Try this!
data['text'] = data['text'].astype(str) corpus = data.text.tolist() corpus_embeddings = embedder.encode(corpus)
I had the same problem and your solution solve it. Thank you.
@rdpulgar Try this!
data['text'] = data['text'].astype(str) corpus = data.text.tolist() corpus_embeddings = embedder.encode(corpus)
I had the same problem and your solution solve it. Thank you.
The reason might be that you pass empty strings, strings with only whitespace, nan
or None
as text...
I also have this problem. I reported this error when using the evaluator. The sentence I entered is also str type, but each str is very long.
The following is my error message:
File "/ssd/zhouhcData/.conda/pkgs/deepmatcher/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 593, in fit
training_steps, callback)
File "/ssd/zhouhcData/.conda/pkgs/deepmatcher/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 616, in _eval_during_training
score = evaluator(self, output_path=output_path, epoch=epoch, steps=steps)
File "/ssd/zhouhcData/.conda/pkgs/deepmatcher/lib/python3.6/site-packages/sentence_transformers/evaluation/EmbeddingSimilarityEvaluator.py", line 78, in call
embeddings2 = model.encode(self.sentences2, batch_size=self.batch_size, show_progress_bar=self.show_progress_bar, convert_to_numpy=True)
File "/ssd/zhouhcData/.conda/pkgs/deepmatcher/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 166, in encode
length_sorted_idx = np.argsort([self._text_length(sen) for sen in sentences])
File "/ssd/zhouhcData/.conda/pkgs/deepmatcher/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 166, in
Even using the methods mentioned above cannot be solved
I resolved the issue using:
data['text'] = data['text'].astype(str) corpus = data.text.tolist() corpus_embeddings = embedder.encode(corpus)
On Oct 30, 2020, at 4:54 AM, zhouhuhq notifications@github.com wrote:
Even using the methods mentioned above cannot be solved
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/UKPLab/sentence-transformers/issues/429#issuecomment-719455991, or unsubscribe https://github.com/notifications/unsubscribe-auth/APX3S5Y72GDBT7XEF5L77BLSNKEM3ANCNFSM4RL23CUA.
I resolved the issue using: data['text'] = data['text'].astype(str) corpus = data.text.tolist() corpus_embeddings = embedder.encode(corpus) … On Oct 30, 2020, at 4:54 AM, zhouhuhq @.***> wrote: Even using the methods mentioned above cannot be solved — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#429 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APX3S5Y72GDBT7XEF5L77BLSNKEM3ANCNFSM4RL23CUA.
I used your solution, but it did not solve my problem, thank you for your answer
Hello,
I am getting this error when try to clustering in Spanish (see ERROR below). I assume my corpus should have a problem. Could you help me to find the nature of the error? (It works perfectly in English)
Thanks.
The code I used is:
""" This is a simple application for sentence embeddings: clustering Sentences are mapped to sentence embeddings and then k-mean clustering is applied. """ ! pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer from sklearn.cluster import KMeans
embedder = SentenceTransformer('distiluse-base-multilingual-cased') corpus = data.text.tolist() corpus_embeddings = embedder.encode(corpus) # Error happens here
Perform kmean clustering kmean code here..
ERROR
TypeError Traceback (most recent call last)