For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetch📊, data from the 💻, imdb website 🌐 and converted into txt files.
Added data preprocessing (cleaning, stemming) and visualizations to amazon_scrapping/Sentiment_Analysis.ipynb
Issue #157
Description
Added clean_text Function:
The clean_text function only performed lowercase conversion, punctuation removal, tokenization, and stop word removal.
Added Visualizations:
I added two visualization sections to the code:
Confusion Matrix Visualization: The confusion matrix shows how the model classified different ratings and helps identify
where it makes mistakes (false positives, false negatives).
Rating Distribution Visualization: to visualize the distribution of ratings in the dataset.
Type of PR
[ ] Bug fix
[ ✅] Feature enhancement
[ ] Documentation update
[ ] Other (specify): ___
Screenshots / videos (if applicable)
Checklist:
[✅ ] I have performed a self-review of my code
[✅ ] I have read and followed the Contribution Guidelines.
[✅ ] I have tested the changes thoroughly before submitting this pull request.
[ ✅] I have provided relevant issue numbers, screenshots, and videos after making the changes.
[✅ ] I have commented my code, particularly in hard-to-understand areas.
Additional context:
Hey! I am A begginer in Machine Learning and did What I thought could overall help the code better.
Anyway for me to imporve are welcome
Thank You!
Related Issue
Added data preprocessing (cleaning, stemming) and visualizations to amazon_scrapping/Sentiment_Analysis.ipynb Issue #157
Description
Added clean_text Function: The clean_text function only performed lowercase conversion, punctuation removal, tokenization, and stop word removal. Added Visualizations: I added two visualization sections to the code: Confusion Matrix Visualization: The confusion matrix shows how the model classified different ratings and helps identify where it makes mistakes (false positives, false negatives). Rating Distribution Visualization: to visualize the distribution of ratings in the dataset.
Type of PR
Screenshots / videos (if applicable)
Checklist:
Additional context:
Hey! I am A begginer in Machine Learning and did What I thought could overall help the code better. Anyway for me to imporve are welcome Thank You!