UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.25k stars 2.47k forks source link

Error: object of type 'float' has no len() #429

Open rdpulgar opened 4 years ago

rdpulgar commented 4 years ago

Hello,

I am getting this error when try to clustering in Spanish (see ERROR below). I assume my corpus should have a problem. Could you help me to find the nature of the error? (It works perfectly in English)

Thanks.

The code I used is:

""" This is a simple application for sentence embeddings: clustering Sentences are mapped to sentence embeddings and then k-mean clustering is applied. """ ! pip install -U sentence-transformers

from sentence_transformers import SentenceTransformer from sklearn.cluster import KMeans

embedder = SentenceTransformer('distiluse-base-multilingual-cased') corpus = data.text.tolist() corpus_embeddings = embedder.encode(corpus) # Error happens here

Perform kmean clustering kmean code here..

ERROR

TypeError Traceback (most recent call last)

in () 40 corpus = data.text.tolist() 41 ---> 42 corpus_embeddings = embedder.encode(corpus) 43 44 # Perform kmean clustering 1 frames /usr/local/lib/python3.6/dist-packages/sentence_transformers/SentenceTransformer.py in (.0) 160 161 all_embeddings = [] --> 162 length_sorted_idx = np.argsort([len(sen) for sen in sentences]) 163 sentences_sorted = [sentences[idx] for idx in length_sorted_idx] 164 inp_dataset = EncodeDataset(sentences_sorted, model=self, is_tokenized=is_pretokenized) TypeError: object of type 'float' has no len()
Ecanlilar commented 4 years ago

I'm having the same exact issue! Please advise.

Ecanlilar commented 4 years ago

@rdpulgar Try this!

data['text'] = data['text'].astype(str) corpus = data.text.tolist() corpus_embeddings = embedder.encode(corpus)

selfcontrol7 commented 4 years ago

I had the same problem and your solution solve it. Thank you.

@rdpulgar Try this!

data['text'] = data['text'].astype(str) corpus = data.text.tolist() corpus_embeddings = embedder.encode(corpus)

I had the same problem and your solution solve it. Thank you.

PhilipMay commented 4 years ago

The reason might be that you pass empty strings, strings with only whitespace, nan or None as text...

zhouhuhq commented 4 years ago

I also have this problem. I reported this error when using the evaluator. The sentence I entered is also str type, but each str is very long. The following is my error message: File "/ssd/zhouhcData/.conda/pkgs/deepmatcher/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 593, in fit training_steps, callback) File "/ssd/zhouhcData/.conda/pkgs/deepmatcher/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 616, in _eval_during_training score = evaluator(self, output_path=output_path, epoch=epoch, steps=steps) File "/ssd/zhouhcData/.conda/pkgs/deepmatcher/lib/python3.6/site-packages/sentence_transformers/evaluation/EmbeddingSimilarityEvaluator.py", line 78, in call embeddings2 = model.encode(self.sentences2, batch_size=self.batch_size, show_progress_bar=self.show_progress_bar, convert_to_numpy=True) File "/ssd/zhouhcData/.conda/pkgs/deepmatcher/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 166, in encode length_sorted_idx = np.argsort([self._text_length(sen) for sen in sentences]) File "/ssd/zhouhcData/.conda/pkgs/deepmatcher/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 166, in length_sorted_idx = np.argsort([self._text_length(sen) for sen in sentences]) File "/ssd/zhouhcData/.conda/pkgs/deepmatcher/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 441, in _text_length if len(text) == 0 or isinstance(text[0], int): TypeError: object of type 'float' has no len()

zhouhuhq commented 4 years ago

Even using the methods mentioned above cannot be solved

rdpulgar commented 4 years ago

I resolved the issue using:

data['text'] = data['text'].astype(str) corpus = data.text.tolist() corpus_embeddings = embedder.encode(corpus)

On Oct 30, 2020, at 4:54 AM, zhouhuhq notifications@github.com wrote:

Even using the methods mentioned above cannot be solved

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/UKPLab/sentence-transformers/issues/429#issuecomment-719455991, or unsubscribe https://github.com/notifications/unsubscribe-auth/APX3S5Y72GDBT7XEF5L77BLSNKEM3ANCNFSM4RL23CUA.

zhouhuhq commented 4 years ago

I resolved the issue using: data['text'] = data['text'].astype(str) corpus = data.text.tolist() corpus_embeddings = embedder.encode(corpus) On Oct 30, 2020, at 4:54 AM, zhouhuhq @.***> wrote: Even using the methods mentioned above cannot be solved — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#429 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APX3S5Y72GDBT7XEF5L77BLSNKEM3ANCNFSM4RL23CUA.

I used your solution, but it did not solve my problem, thank you for your answer