Pull Request for DL-Simplified 💡

Issue Title : Website Classification

Info about the related issue (Aim of the project) : The aim of the project is to classify URLs into different categories based on their content. This classification task involves training machine learning models to accurately predict the category of a given URL.
Name: Manish Kumar Gupta
GitHub ID: manishh12
Email ID: manishdid360@gmail.com
Idenitfy yourself: (Mention in which program you are contributing in. Eg. For a JWOC 2022 participant it's, JWOC Participant) GSSOC'24

Closes: #606

Describe the add-ons or changes you've made 📃

The aim is to classify URLs into predefined categories such as adult content, arts, business, computers, games, health, home, kids, news, recreation, reference, science, shopping, society, or sports. The project involves preprocessing the dataset, visualizing the distribution of categories, training the models, and evaluating their performance.

Implemented the CNN model architecture, including embedding, convolutional, max pooling, flatten, dropout, and dense layers.
Trained the CNN model using the provided dataset and evaluated its performance.
Plotted the training and validation loss values as well as the training and validation accuracy values for the CNN model.
Defined the BiLSTM model architecture, consisting of embedding, bidirectional LSTM, and dense layers.
Compiled and trained the BiLSTM model on the dataset.
Generated plots showing the accuracy and loss curves for the BiLSTM model.

The modifications include adding detailed descriptions of the CNN and BiLSTM model architectures, training the models, evaluating their performance, and visualizing the training progress through loss and accuracy plots. Additionally, explanations were provided for training the models for fewer epochs due to resource constraints, which may affect the achieved accuracy.

Due to resource constraints, including limited GPU allocation, frequent runtime disconnects across multiple accounts, and the substantial size of the dataset, I could only implement two models: CNN and BiLSTM. Additionally, I was only able to train these models for a limited number of epochs.

Type of change ☑️

What sort of change have you made:

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Code style update (formatting, local variables)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update

Checklist: ☑️

[x] My code follows the guidelines of this project.
[x] I have performed a self-review of my own code.
[x] I have commented my code, particularly wherever it was hard to understand.
[x] I have made corresponding changes to the documentation.
[x] My changes generate no new warnings.
[x] I have added things that prove my fix is effective or that my feature works.
[ ] Any dependent changes have been merged and published in downstream modules.

abhisheks008 / DL-Simplified

Website Classification #627

Pull Request for DL-Simplified 💡

Issue Title : Website Classification

Describe the add-ons or changes you've made 📃

Type of change ☑️

Checklist: ☑️