ddangelov / Top2Vec

Top2Vec learns jointly embedded topic, document and word vectors.
BSD 3-Clause "New" or "Revised" License
2.94k stars 374 forks source link

UFuncTypeError: ufunc #287

Open wenlanzhang opened 2 years ago

wenlanzhang commented 2 years ago

Sorry if this one has been asked before.

When I go through the example you have in the README section: model = Top2Vec(documents=newsgroups.data, speed="learn", workers=8)

I came across this error:

`--------------------------------------------------------------------------- UFuncTypeError Traceback (most recent call last) Input In [8], in <cell line: 5>() 2 from datetime import timedelta 3 start_time = time.monotonic() ----> 5 model = Top2Vec(documents=newsgroups.data) 7 end_time1 = time.monotonic() 8 print(timedelta(seconds = end_time1 - start_time))

File /opt/miniconda3/envs/Twitter/lib/python3.9/site-packages/top2vec/Top2Vec.py:668, in Top2Vec.init(self, documents, min_count, ngram_vocab, ngram_vocab_args, embedding_model, embedding_model_path, embedding_batch_size, split_documents, document_chunker, chunk_length, max_num_chunks, chunk_overlap_ratio, chunk_len_coverage_ratio, sentencizer, speed, use_corpus_file, document_ids, keep_documents, workers, tokenizer, use_embedding_model_tokenizer, umap_args, hdbscan_args, verbose) 663 if umap_args is None: 664 umap_args = {'n_neighbors': 15, 665 'n_components': 5, 666 'metric': 'cosine'} --> 668 umap_model = umap.UMAP(**umap_args).fit(self.document_vectors) 670 # find dense areas of document vectors 671 logger.info('Finding dense areas of documents')

File /opt/miniconda3/envs/Twitter/lib/python3.9/site-packages/umap/umap_.py:2516, in UMAP.fit(self, X, y) 2510 nn_metric = self._input_distance_func 2511 if self.knn_dists is None: 2512 ( 2513 self._knn_indices, 2514 self._knn_dists, 2515 self._knn_search_index, -> 2516 ) = nearest_neighbors( 2517 X[index], 2518 self._n_neighbors, 2519 nn_metric, 2520 self._metric_kwds, 2521 self.angular_rp_forest, 2522 random_state, 2523 self.low_memory, 2524 use_pynndescent=True, 2525 n_jobs=self.n_jobs, 2526 verbose=self.verbose, 2527 ) 2528 else: 2529 self._knn_indices = self.knn_indices

File /opt/miniconda3/envs/Twitter/lib/python3.9/site-packages/umap/umap_.py:342, in nearest_neighbors(X, n_neighbors, metric, metric_kwds, angular, random_state, low_memory, use_pynndescent, n_jobs, verbose) 326 n_iters = max(5, int(round(np.log2(X.shape[0])))) 328 knn_search_index = NNDescent( 329 X, 330 n_neighbors=n_neighbors, (...) 340 compressed=False, 341 ) --> 342 knn_indices, knn_dists = knn_search_index.neighbor_graph 344 if verbose: 345 print(ts(), "Finished Nearest Neighbor Search")

File /opt/miniconda3/envs/Twitter/lib/python3.9/site-packages/pynndescent/pynndescent_.py:1564, in NNDescent.neighbor_graph(self) 1560 return None 1561 if self._distance_correction is not None: 1562 result = ( 1563 self._neighbor_graph[0].copy(), -> 1564 self._distance_correction(self._neighbor_graph[1]), 1565 ) 1566 else: 1567 result = (self._neighbor_graph[0].copy(), self._neighbor_graph[1].copy())

UFuncTypeError: ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types <class 'numpy.dtype[float32]'> -> None`

My numpy is 1.22

brunns commented 1 year ago

Related to https://github.com/lmcinnes/pynndescent/issues/163?

dmcrun commented 1 year ago

I noted this error as well. It is related to the issue linked to by brunns above. My fix was to make the proposed change here https://github.com/lmcinnes/pynndescent/issues/163#issuecomment-1007534108, to a forked repo of pynndescent.