Closed ibeltagy closed 5 years ago
Lucy's team ran into this bug during the hackathon
>>> nlp("hydroxytryptophan") --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-231-7e41c1b0131c> in <module> ----> 1 nlp("hydroxytryptophan") //anaconda/envs/scispacy/lib/python3.6/site-packages/spacy/language.py in __call__(self, text, disable, component_cfg) 393 if not hasattr(proc, "__call__"): 394 raise ValueError(Errors.E003.format(component=type(proc), name=name)) --> 395 doc = proc(doc, **component_cfg.get(name, {})) 396 if doc is None: 397 raise ValueError(Errors.E005.format(name=name)) //anaconda/envs/scispacy/lib/python3.6/site-packages/scispacy/umls_linking.py in __call__(self, doc) 85 86 mention_strings = [x.text for x in mentions] ---> 87 batch_candidates = self.candidate_generator(mention_strings, self.k) 88 89 for mention, candidates in zip(doc.ents, batch_candidates): //anaconda/envs/scispacy/lib/python3.6/site-packages/scispacy/candidate_generation.py in __call__(self, mention_texts, k) 201 if self.verbose: 202 print(f'Generating candidates for {len(mention_texts)} mentions') --> 203 tfidfs = self.vectorizer.transform(mention_texts) 204 start_time = datetime.datetime.now() 205 //anaconda/envs/scispacy/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in transform(self, raw_documents, copy) 1679 1680 X = super().transform(raw_documents) -> 1681 return self._tfidf.transform(X, copy=False) 1682 1683 def _more_tags(self): //anaconda/envs/scispacy/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in transform(self, X, copy) 1300 vectors : sparse matrix, [n_samples, n_features] 1301 """ -> 1302 X = check_array(X, accept_sparse='csr', dtype=FLOAT_DTYPES, copy=copy) 1303 if not sp.issparse(X): 1304 X = sp.csr_matrix(X, dtype=np.float64) //anaconda/envs/scispacy/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 548 " minimum of %d is required%s." 549 % (n_samples, array.shape, ensure_min_samples, --> 550 context)) 551 552 if ensure_min_features > 0 and array.ndim == 2: ValueError: Found array with 0 sample(s) (shape=(0, 53479)) while a minimum of 1 is required.
looks like the linking pipe crashes if there are no entities found in the doc, which is pretty rare for the base detectors trained on medmentions. i'll fix real quick
Lucy's team ran into this bug during the hackathon