Aim:
Get an accurate list of topics (around 20 topics max) for an agri dataset of queries (has around 20k unique queries) using BERTTopic for the dataset
Description:
Implemented BERTopic model to accurately segment the agriculture dataset into 20 distinct topics.
Utilized the 'questioninEnglish' column containing approximately 20,000 unique queries for topic analysis.
Successfully generated 20 topics using BERTopic, leveraging contextual embeddings from BERT for clustering.
Used HDBSCAN model for BERTopic
Plotted a Intertopic Distance Model alongside various other output graphs and barcharts.
Open to suggestions for improving topic cluster evaluation and enhancing the clustering process.
Steps:
1)Read the csv file and take 'queryInEnglish' column into consideration
2)Preprocessing of data by removing stop words and commas.
3)Training BERTopic
4)Visualizing results
5)Saving Model
Fix for- #291
Aim: Get an accurate list of topics (around 20 topics max) for an agri dataset of queries (has around 20k unique queries) using BERTTopic for the dataset
Description:
Steps:
1)Read the csv file and take 'queryInEnglish' column into consideration 2)Preprocessing of data by removing stop words and commas. 3)Training BERTopic 4)Visualizing results 5)Saving Model