Market Sentiment Analysis on Tokopedia Product with NLP Model
Project Overview

In this notebook, we will delve into text cleaning for product reviews written in Indonesian language and subsequently implement an NLP model with GRU and LSTM layers. The objective is to classify the reviews into one of three categories: Negative, Neutral, and Positive Sentiment.


The data was obtained from Kaggle and uploaded as product_reviews.csv. The data is about product reviews in Indonesian from Tokopedia.

Data field description: Columns Meaning
text feedback given from the users who have bought the product
rating star rating given for the product
category category of the product on Tokopedia platform
product_name name of the product on Tokopedia platform
product_id unique ID for the product
sold number of product sold
shop_id unuqie ID for the seller on Tokopedia platform
product_url the link to the product

Notebook Stucture

Here is the outline for the notebook:


Result and Evaluation

Words Sentiment Above is the most common words on each sentiment.

I am comparing the GRU and LSTM layer models to achieve better results. The LSTM layer model outperforms the GRU model because, despite not having superior categorical accuracy, it exhibits less overfitting.

Gru Tunning Gru with Tunning Result

LSTM Tunning LSTM with Tunning Result

It is evident that both models are experiencing underfitting, resulting in a modest ROC AUC score of 57.5%. This suggests that the models face challenges in effectively differentiating between sentences of varying sentiments. Several factors contribute to this subpar performance, including the lack of adequate text preprocessing. Indonesian comments often contain abbreviations and diverse writing styles, making it crucial to address these linguistic nuances during preprocessing.

Future Work

For future work, for Indonesian words especially, we can use manual computing rather than using model. So first we can identify the most common words for each sentiment, put them in lists. And create a function that manually count how many words are corresponding to each sentiment.


