In this notebook, we will delve into text cleaning for product reviews written in Indonesian language and subsequently implement an NLP model with GRU and LSTM layers. The objective is to classify the reviews into one of three categories: Negative, Neutral, and Positive Sentiment.
The data was obtained from Kaggle and uploaded as product_reviews.csv. The data is about product reviews in Indonesian from Tokopedia.
Data field description: | Columns | Meaning |
---|---|---|
text | feedback given from the users who have bought the product | |
rating | star rating given for the product | |
category | category of the product on Tokopedia platform | |
product_name | name of the product on Tokopedia platform | |
product_id | unique ID for the product | |
sold | number of product sold | |
shop_id | unuqie ID for the seller on Tokopedia platform | |
product_url | the link to the product |
Here is the outline for the notebook:
Above is the most common words on each sentiment.
I am comparing the GRU and LSTM layer models to achieve better results. The LSTM layer model outperforms the GRU model because, despite not having superior categorical accuracy, it exhibits less overfitting.
Gru with Tunning Result
LSTM with Tunning Result
It is evident that both models are experiencing underfitting, resulting in a modest ROC AUC score of 57.5%. This suggests that the models face challenges in effectively differentiating between sentences of varying sentiments. Several factors contribute to this subpar performance, including the lack of adequate text preprocessing. Indonesian comments often contain abbreviations and diverse writing styles, making it crucial to address these linguistic nuances during preprocessing.
For future work, for Indonesian words especially, we can use manual computing rather than using model. So first we can identify the most common words for each sentiment, put them in lists. And create a function that manually count how many words are corresponding to each sentiment.
Model deployment link https://huggingface.co/spaces/andreetanjung/Milestone2_Phase2