Jayshah6699 / datascience-mashup

In this repo I will try to gather all of the projects related to data science with clean datasets and high accuracy models to solve real world problems.
MIT License
41 stars 53 forks source link

NLP based project to predict if a restaurant review is positive or negative #43

Closed geekaditi closed 3 years ago

geekaditi commented 3 years ago

It will use the bag of words model and Naive Bayes classification. The code will be in python.

Jayshah6699 commented 3 years ago

I am gonna need dataset information and a little bit more information about your approach too @geekaditi

geekaditi commented 3 years ago

I have the dataset with me in a .tsv file. It has 2 columns. One is for review and the other one is in binary-1 for positive review and 0 for negative. It has a total of 1000 reviews in the file. My approach is first cleaning the text(libraries used: re,nltk). Next I'll create the bag of words model using the countVectorizer class from the sklearn library.Then split the dataset into train and test set. And fit the Gaussian naive bayes classifier to the training set. Then make predictions using this classifier on the test set. Can I make a PR with the code and dataset? That way it'll be easier for you to understand the approach.

ksdkamesh99 commented 3 years ago

I want to work on this Kindly assign me if @geekaditi takes more time @Jayshah6699

Jayshah6699 commented 3 years ago

Everyone takes their own time some people can do it fast some people take time for learning and then apply. @ksdkamesh99 kindly do another project as there are thousands of similar projects on different datasets with a little bit of changes and a little different approach So I would suggest let's @geekaditi work on this project and you pick another one