Open ertugrul-dmr opened 3 years ago
Can you explain above how you prepared the sentiment_class
column from category points. Also the distribution of labels before and after merge. So that your proposal of the dataset will have a valid and grounded explanation since it is not something benchmarked in a paper but we found and prepared for the food
domain.
Original dataset creator suggested using mean of three categories into one and if the mean is bigger or equal to 7 the label it as a "Positive" otherwise "Negative". Although it's giving decent results; after doing some trials I decided to choose minimum of the three categories, I believe the real "sentiment" of the reviews are about the lowest scoring category and text includes the complain about that topic. So if an user gives 1 10 10 taking mean of them gives positive sentiment but usually the text includes negative sentiment about the 1 scored category. I got better F-1 score with this approach.
For distribution I'll share the results with you after I double check the process...
For the purpose of adding different use cases of our prebuilt function we could implement a food delivery review classification model. There are decent datasets available on kaggle here and here (we're going to merge both).
For this, these steps going be taken: