Amazon_Fine_Food_Review
Amazon.com, Inc. is an American multinational technology company based in Seattle, Washington, which focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. The fine food data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plaintext review.
The dataset belongs to Stanford Network Analysis Project
1. What are the target of this project?
a. Business Acumen:
- After understanding the Amazon Fine Food review dataset, as a data scientist, I have few questions to set up the outline that helps me dive into the project.
What is the connection between the food review score with the reviews and the products?
Any correlation between products and top users who often write reviews?
Can I extract the top product based on users’ recommendation?
What are the top words that help business to understand whether it is a good review or not?
Can I predict the positive and the negative reviews?
For building a better prediction, should I choose machine learning algorithms or deep learning model?
- Those questions help me to separate the dataset into two parts:
One is the correlation of userID, producID, review score to bring up the business solution: recommend food product item
Another is the correlation of plaintext reviews with the sentiment analysis
b. Target:
- Analyzing the top review, top product, top user for fine food
- Applying Sentiment Analysis to analyze the plaintext review
Words in Positive Reviews
Word in Nevative Reviews
2. My solution
a. Create a Recommendation system based on Sparse Matrix for fine food
- EDA based on UserId, ProductId, HelpfulnessNumerator, HelpfulnessDenominator, Score, Time
- Utilizing Descriptive Analysis
- Applying Sparse Matrix to define the recommended selection
- Evaluating my recommdation system with MSE
b. Sentiment Analyse
a.1. Building Popularity Recommender system
- Since this is a popularity-based recommender model, recommendations remain the same for all users
- We predict the products based on the popularity. It is not personalized to particular user
a.2. Building Collabrating Filtering
- Model-based Collaborative Filtering is a personalised recommender system, the recommendations are based on the past behavior of the user and it is not dependent on any additional information.
- Based on the real value and the predict value, it is clear to see that the predictive recomendation system is great
- The Popularity-based recommender system is non-personalised and the recommendations are based on frequecy counts, which may be not suitable to the user.You can see the differance above for the user id 70 and 100, The Popularity based model has recommended the same set of 5 or 6 products to both but Collaborative Filtering based model has recommended entire different list based on the user past purchase history
b. Sentiment Analysis
b.1 Machine Learning Algorithms
- Logistic Regresison is the best model that fit in this dataset because it bring the highest accuracy score with the lowest log loss
- Tuning model, the best parameters set fo Logistic Regression is
lr__C: 100.0, lr__penalty: 'none', lr__solver: 'saga'
b.2 Clustring the top words with K-mean
- The best number of cluster is 3 with the highest Silhoutte Score is 0.002
- Top 10 words are ['strong', 'br', 'cup coffee', 'tea', 'taste', 'like', 'cups', 'flavor', 'coffee', 'love', 'one', 'food', 'good', 'cup', 'great', 'bold', 'br br', 'product']
b.3 Deep Learning
- Developing both ANN and RNN-LSTM, LSTM is the best one with the best accuracy score is 96%
4. Conclusion
- Amazon Fine Food Review dataset is the incredible one. It allows me to utilized all my skills: statistical analysis, supervised learning method, unsupervised learning method, machine learning algorithm, deep learning.
- From this dataset, I learn that with deep learning, everything is so simple. Kereas class with tokenzie to vectorize word is faster than TF-IDF traditional method. Also, RNN-LSTM utilized the plaintext review vectorized to bring up the better accuracy score for future sentiment analysis
- My work is useful for all type of e-commerce because it can apply for both strategy team and customer service team to help the business to be better.