Open Niraj1608 opened 1 day ago
🙌 Thank you for bringing this issue to our attention! We appreciate your input and will investigate it as soon as possible.
Feel free to join our community on Discord to discuss more!
@UTSAVS26 Please assign me the issue under gssoc extd
@UTSAVS26
Title
Enhance and merge email spam detection notebook with EDA and NLP improvements and streamlit deployment
Enhancement Aim
Hi, I am Niraj. I've been reviewing the Email Spam Detection with Machine Learning project and noticed several areas where improvements can be made.
Changes
Advanced NLP Techniques: Incorporating more Natural Language Processing (NLP) techniques, such as advanced tokenization, lemmatization, and more sophisticated vectorization techniques like TF-IDF. Data Cleaning: Introducing robust data cleaning methods to remove noisy data, handle missing values, and preprocess text data more efficiently will improve the model's accuracy. This would enhance the overall performance of the spam detection model by making it more interpretable and efficient through better visualizations and data processing. Dataset Issue: The project is missing the dataset required for running the notebook. I propose including a well-structured dataset to ensure reproducibility and ease of use for others. Enhanced EDA: Adding more detailed charts and visualizations using Python libraries like Seaborn or Matplotlib will help in better understanding the data distribution and correlation between features. This could include heatmaps, pair plots, and distribution plots to visualize relationships and patterns in the data.
Screenshots 📷
adding my coding colab file spam_mail_pridector.ipynb - Colab.pdf
Guidelines
Full Name
Parmar Niraj Jagdishbhai
Participant Role
GSSOC