Open Yanith1 opened 8 months ago
That is correct. The visualize_topics
method reduces the topic embeddings to 2-dimensional space with UMAP which has not set a random state. If you were to set a random state, then it would slow things down. You could create your own version by adopting the code here.
Thanks a lot for your help Maarten. I really appreciate this!
Hello there,
So I am pretty new to this, but I am really interested in using this to explore my corpus. I am not sure if this is inherent in the code itself, but whenever I try to rerun it, it keeps loading a different form of intertopic distance map. So this means that I cannot replicate it which is not ideal. I have attached below the codes that I used. Thank you!
` df_clean = df.dropna(subset=['Policy_Content']) umap = UMAP(n_neighbors=15, n_components=5, min_dist=0.0, metric='cosine', low_memory=False, random_state=123) vectorizer_model = CountVectorizer(stop_words="english", min_df=2, ngram_range=(1, 2))
topic_model = BERTopic(umap_model=umap,vectorizer_model=vectorizer_model, verbose=True) topics, probs = topic_model.fit_transform(df_clean['Policy_Content'])
227 topics in total
topic_model.reduce_topics(df_clean['Policy_Content'], nr_topics=48)
topic_model.visualize_topics() `
Warm regards, Yanith