[Project Addition]:Website Classification

manishh12 commented 1 month ago

Deep Learning Simplified Repository (Proposing new issue)

:red_circle: Project Title : website Classification
:red_circle: Aim : Its Aim is to classify the website into different categories based on URL
:red_circle: Dataset : https://www.kaggle.com/datasets/shaurov/website-classification-using-url/data?select=URL+Classification.csv
:red_circle: Approach : Classification can be done using CNN, NaiveBayes (Multinomial) and SVM.

📍 Follow the Guidelines to Contribute in the Project :

You need to create a separate folder named as the Project Title.
Inside that folder, there will be four main components.
- Images - To store the required images.
- Dataset - To store the dataset or, information/source about the dataset.
- Model - To store the machine learning model you've created using the dataset.
- requirements.txt - This file will contain the required packages/libraries to run the project in other machines.
Inside the Model folder, the README.md file must be filled up properly, with proper visualizations and conclusions.

:red_circle::yellow_circle: Points to Note :

The issues will be assigned on a first come first serve basis, 1 Issue == 1 PR.
"Issue Title" and "PR Title should be the same. Include issue number along with it.
Follow Contributing Guidelines & Code of Conduct before start Contributing.

:white_check_mark: To be Mentioned while taking the issue :

Full name : Manish Kumar Gupta
GitHub Profile Link : https://github.com/manishh12
Email ID :manishdid360@gmail.com
Participant ID (if applicable):manishh12
Approach for this Project :I will be implementing the SVM , Naive bayes and CNN model for classification.
What is your participant role? (Mention the Open Source program)Contributor in GSSOC'24

Happy Contributing 🚀

All the best. Enjoy your open source journey ahead. 😎

abhisheks008 commented 1 month ago

One issue at a time.

manishh12 commented 1 month ago

@abhisheks008 Since my present issue is currently being reviewed, may I proceed with this new one? Could you please assign it to me?

abhisheks008 commented 1 month ago

Need to upgrade the approach of this project? Can you add some upgraded deep learning methods for this dataset?

@manishh12 looking forward to hearing from you.

manishh12 commented 1 month ago

Need to upgrade the approach of this project? Can you add some upgraded deep learning methods for this dataset?

@manishh12 looking forward to hearing from you.

@abhisheks008 Sure, I can implement text-based classification using CNN,Bidirectional-LSTM, and Transformer models like BERT.

The tentative approach will going to be : Data collection : From the link. Text Preprocessing :Tokenization( Convert text into tokens.)

Model Implementation Convolutional Neural Network (CNN): Create an embedding layer to convert tokens to dense vectors. Add multiple Conv1D layers followed by MaxPooling1D. Flatten the output and add dense layers for classification.

Bidirectional LSTM (BiLSTM): Create an embedding layer. Add Bidirectional LSTM layers to process sequences in both directions. Add dense layers for classification.

Transformer Model (BERT): Use a pre-trained BERT model. Define input layers for input IDs and attention masks. Extract the [CLS] token representation and add dense layers for classification.

4.Training and Evaluation Training: Compile each model with an appropriate optimizer and loss function. Train the models on the training dataset using validation data for monitoring.

Evaluation: Evaluate the trained models on a separate test dataset to assess their performance. Use metrics such as accuracy, precision, recall, and F1-score.

The approach may be adjusted based on the results I obtain.

abhisheks008 commented 1 month ago

Assigned @manishh12

abhisheks008 / DL-Simplified

[Project Addition]:Website Classification #606

Deep Learning Simplified Repository (Proposing new issue)

📍 Follow the Guidelines to Contribute in the Project :