Food Delivery Reviews Classification Prebuilt Model

GlobalMaksimum / sadedegel

A General Purpose NLP library for Turkish

http://sadedegel.ai

MIT License

93 stars 15 forks source link

Food Delivery Reviews Classification Prebuilt Model #290

Open ertugrul-dmr opened 3 years ago

ertugrul-dmr commented 3 years ago

For the purpose of adding different use cases of our prebuilt function we could implement a food delivery review classification model. There are decent datasets available on kaggle here and here (we're going to merge both).

In total both datasets contians 550K+ instances after preprocessing and removing duplicates.
Data is collected from Turkish food delivery web sites.
The data contains reviews about the food delivery and scoring for several aspects of the service.

For this, these steps going be taken:

Prepare the dataset for sentiment classification,
Build/optimize a model and test model accuracy,
Publish the public prebuilt model

dafajon commented 3 years ago

Can you explain above how you prepared the sentiment_class column from category points. Also the distribution of labels before and after merge. So that your proposal of the dataset will have a valid and grounded explanation since it is not something benchmarked in a paper but we found and prepared for the food domain.

ertugrul-dmr commented 3 years ago

Original dataset creator suggested using mean of three categories into one and if the mean is bigger or equal to 7 the label it as a "Positive" otherwise "Negative". Although it's giving decent results; after doing some trials I decided to choose minimum of the three categories, I believe the real "sentiment" of the reviews are about the lowest scoring category and text includes the complain about that topic. So if an user gives 1 10 10 taking mean of them gives positive sentiment but usually the text includes negative sentiment about the 1 scored category. I got better F-1 score with this approach.

For distribution I'll share the results with you after I double check the process...