MaartenGr / KeyBERT

Minimal keyword extraction with BERT
https://MaartenGr.github.io/KeyBERT/
MIT License
3.44k stars 343 forks source link

Demo doesn't work #12

Closed marsupialtail closed 3 years ago

marsupialtail commented 3 years ago

Hey I am interested in trying out demo, but it gives error: doc = """ Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs.[1] It infers a function from labeled training data consisting of a set of training examples.[2] In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a 'reasonable' way (see inductive bias). """ model = KeyBERT('distilbert-base-nli-mean-tokens') keywords = model.extract_keywords(doc)


TypeError Traceback (most recent call last)

in 14 """ 15 model = KeyBERT('distilbert-base-nli-mean-tokens') ---> 16 keywords = model.extract_keywords(doc) ~/anaconda3/lib/python3.7/site-packages/keybert/model.py in extract_keywords(self, docs, keyphrase_length, stop_words, top_n, min_df, use_maxsum, use_mmr, diversity, nr_candidates) 90 use_mmr, 91 diversity, ---> 92 nr_candidates) 93 elif isinstance(docs, list): 94 warnings.warn("Although extracting keywords for multiple documents is faster " ~/anaconda3/lib/python3.7/site-packages/keybert/model.py in _extract_keywords_single_doc(self, doc, keyphrase_length, stop_words, top_n, use_maxsum, use_mmr, diversity, nr_candidates) 129 # Extract Words 130 n_gram_range = (keyphrase_length, keyphrase_length) --> 131 count = CountVectorizer(ngram_range=n_gram_range, stop_words=stop_words).fit([doc]) 132 words = count.get_feature_names() 133 ~/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in fit(self, raw_documents, y) 1163 """ 1164 self._warn_for_unused_params() -> 1165 self.fit_transform(raw_documents) 1166 return self 1167 ~/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y) 1218 max_doc_count, 1219 min_doc_count, -> 1220 max_features) 1221 if max_features is None: 1222 X = self._sort_features(X, vocabulary) ~/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py in _limit_features(self, X, vocabulary, high, low, limit) 1088 raise ValueError("After pruning, no terms remain. Try a lower" 1089 " min_df or a higher max_df.") -> 1090 return X[:, kept_indices], removed_terms 1091 1092 def _count_vocab(self, raw_documents, fixed_vocab): ~/anaconda3/lib/python3.7/site-packages/scipy/sparse/_index.py in __getitem__(self, key) 33 """ 34 def __getitem__(self, key): ---> 35 row, col = self._validate_indices(key) 36 # Dispatch to specialized methods. 37 if isinstance(row, INT_TYPES): ~/anaconda3/lib/python3.7/site-packages/scipy/sparse/_index.py in _validate_indices(self, key) 146 col += N 147 elif not isinstance(col, slice): --> 148 col = self._asindices(col, N) 149 150 return row, col ~/anaconda3/lib/python3.7/site-packages/scipy/sparse/_index.py in _asindices(self, idx, length) 167 168 # Check bounds --> 169 max_indx = x.max() 170 if max_indx >= length: 171 raise IndexError('index (%d) out of range' % max_indx) ~/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py in _amax(a, axis, out, keepdims, initial, where) 37 def _amax(a, axis=None, out=None, keepdims=False, 38 initial=_NoValue, where=True): ---> 39 return umr_maximum(a, axis, None, out, keepdims, initial, where) 40 41 def _amin(a, axis=None, out=None, keepdims=False, TypeError: int() argument must be a string, a bytes-like object or a number, not '_NoValueType'
MaartenGr commented 3 years ago

Strange, I am not getting the same error when I am running KeyBERT in Google Colab. Can you check if you have the newest version of KeyBERT?

pip install --upgrade keybert

If that does not work. Perhaps it would be best to create a new environment within Anaconda and then run the code again.

KTG1 commented 1 year ago

Hello, this issue is still not solved, I am getting the same error despite 2 years passed.

MaartenGr commented 1 year ago

@KTG1 Did you try installing KeyBERT from a completely fresh environment? There might be some issues with your installed dependencies.