Samagra-Development / ai-tools

AI Tooling to bootstrap applications fast
44 stars 110 forks source link

Implemented BERTopic Model for Accurate Topic Segmentation in Agriculture Dataset issue#291 #310

Open pmukesh31 opened 7 months ago

pmukesh31 commented 7 months ago

Fix for- #291

Aim: Get an accurate list of topics (around 20 topics max) for an agri dataset of queries (has around 20k unique queries) using BERTTopic for the dataset

Description:

Steps:

1)Read the csv file and take 'queryInEnglish' column into consideration 2)Preprocessing of data by removing stop words and commas. 3)Training BERTopic 4)Visualizing results 5)Saving Model

image