ddangelov / Top2Vec

Top2Vec learns jointly embedded topic, document and word vectors.
BSD 3-Clause "New" or "Revised" License
2.95k stars 374 forks source link

Contextual_top2vec error #362

Open kirrat975 opened 1 week ago

kirrat975 commented 1 week ago

@ddangelov An error is occuring since new contextual_top2vec is added.When i am using embeddingmodel='doc2vec' then at time of finding topics this error occurs: 2024-11-14 14:56:44,195 - top2vec - INFO - Pre-processing documents for training INFO:top2vec:Pre-processing documents for training 2024-11-14 14:56:45,567 - top2vec - INFO - Creating joint document/word embedding INFO:top2vec:Creating joint document/word embedding 2024-11-14 15:00:05,393 - top2vec - INFO - Creating lower dimension embedding of documents INFO:top2vec:Creating lower dimension embedding of documents /usr/local/lib/python3.10/dist-packages/umap/umap.py:1952: UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism. warn( 2024-11-14 15:00:20,966 - top2vec - INFO - Finding dense areas of documents INFO:top2vec:Finding dense areas of documents 2024-11-14 15:00:21,130 - top2vec - INFO - Finding topics INFO:top2vec:Finding topics AttributeError Traceback (most recent call last)

in [/usr/local/lib/python3.10/dist-packages/top2vec/top2vec.py](https://localhost:8080/#) in __init__(self, documents, contextual_top2vec, c_top2vec_smoothing_window, min_count, topic_merge_delta, ngram_vocab, ngram_vocab_args, embedding_model, embedding_model_path, embedding_batch_size, split_documents, document_chunker, chunk_length, max_num_chunks, chunk_overlap_ratio, chunk_len_coverage_ratio, sentencizer, speed, use_corpus_file, document_ids, keep_documents, workers, tokenizer, use_embedding_model_tokenizer, umap_args, gpu_umap, hdbscan_args, gpu_hdbscan, index_topics, verbose) 780 self.topics_indexed = False 781 --> 782 self.compute_topics(umap_args=umap_args, 783 hdbscan_args=hdbscan_args, 784 topic_merge_delta=topic_merge_delta, [/usr/local/lib/python3.10/dist-packages/top2vec/top2vec.py](https://localhost:8080/#) in compute_topics(self, umap_args, hdbscan_args, topic_merge_delta, gpu_umap, gpu_hdbscan, index_topics, contextual_top2vec, c_top2vec_smoothing_window) 1597 self.hierarchy = None 1598 -> 1599 if self.contextual_top2vec & contextual_top2vec: 1600 1601 # smooth document token embeddings AttributeError: 'Top2Vec' object has no attribute 'contextual_top2vec'. **KINDLY GIVE ME A WAY TO RESOLVE THIS I AM USING THIS MODEL IN PROJECT AND DEADLINE IS NEAR.**
ddangelov commented 1 week ago

I just pushed a fix, good catch. Any specific reason you are using doc2vec as embedding model?

kirrat975 commented 1 week ago

@ddangelov at first i was using universalsentence encoder but it was causing import error so i used this,can you suggest an embedding model that is good for domain specific topics(technical)?