abhisheks008 / DL-Simplified

Deep Learning Simplified is an Open-source repository, containing beginner to advance level deep learning projects for the contributors, who are willing to start their journey in Deep Learning. Devfolio URL, https://devfolio.co/projects/deep-learning-simplified-f013
https://quine.sh/repo/abhisheks008-DL-Simplified-499023976
MIT License
324 stars 290 forks source link

[Project Addition]:Website Classification #606

Closed manishh12 closed 1 month ago

manishh12 commented 1 month ago

Deep Learning Simplified Repository (Proposing new issue)

:red_circle: Project Title : website Classification
:red_circle: Aim : Its Aim is to classify the website into different categories based on URL
:red_circle: Dataset : https://www.kaggle.com/datasets/shaurov/website-classification-using-url/data?select=URL+Classification.csv
:red_circle: Approach : Classification can be done using CNN, NaiveBayes (Multinomial) and SVM.


πŸ“ Follow the Guidelines to Contribute in the Project :


:red_circle::yellow_circle: Points to Note :


:white_check_mark: To be Mentioned while taking the issue :


Happy Contributing πŸš€

All the best. Enjoy your open source journey ahead. 😎

abhisheks008 commented 1 month ago

One issue at a time.

manishh12 commented 1 month ago

@abhisheks008 Since my present issue is currently being reviewed, may I proceed with this new one? Could you please assign it to me?

abhisheks008 commented 1 month ago

Need to upgrade the approach of this project? Can you add some upgraded deep learning methods for this dataset?

@manishh12 looking forward to hearing from you.

manishh12 commented 1 month ago

Need to upgrade the approach of this project? Can you add some upgraded deep learning methods for this dataset?

@manishh12 looking forward to hearing from you.

@abhisheks008 Sure, I can implement text-based classification using CNN,Bidirectional-LSTM, and Transformer models like BERT.

The tentative approach will going to be : Data collection : From the link. Text Preprocessing :Tokenization( Convert text into tokens.)

Model Implementation Convolutional Neural Network (CNN): Create an embedding layer to convert tokens to dense vectors. Add multiple Conv1D layers followed by MaxPooling1D. Flatten the output and add dense layers for classification.

Bidirectional LSTM (BiLSTM): Create an embedding layer. Add Bidirectional LSTM layers to process sequences in both directions. Add dense layers for classification.

Transformer Model (BERT): Use a pre-trained BERT model. Define input layers for input IDs and attention masks. Extract the [CLS] token representation and add dense layers for classification.

4.Training and Evaluation Training: Compile each model with an appropriate optimizer and loss function. Train the models on the training dataset using validation data for monitoring.

Evaluation: Evaluate the trained models on a separate test dataset to assess their performance. Use metrics such as accuracy, precision, recall, and F1-score.

The approach may be adjusted based on the results I obtain.

abhisheks008 commented 1 month ago

Assigned @manishh12