[FEATURE]: Expand DLP to Support Additional Default Datasets for Enhanced Testing and Training

codingwithsurya commented 1 year ago

Feature Name

Support Additional Default Datasets for Enhanced Testing and Training

Your Name

Surya Subramanian

Description

To enhance the versatility and applicability of the Deep Learning Playground (DLP), we propose to add support for more default datasets. These datasets will provide users with a wider range of options for testing and training their machine learning models. Each dataset comes with unique characteristics and challenges, making them ideal for various research and application purposes.

Here are some proposed datasets you can try to integrate: CIFAR100: Offers 100 classes, each with 600 images (500 for training and 100 for testing). A more complex version of CIFAR10.

SVHN (Street View House Numbers): A real-world image dataset for developing machine learning and object recognition algorithms, requiring minimal data preprocessing.

ImageNet: A large and complex visual database designed for visual object recognition software research.

CelebA (Celebrity Faces Attributes): A large-scale face attributes dataset with over 200,000 celebrity images, each with 40 attribute annotations.

COIL100 (Columbia Object Image Library 100): Consists of 7200 images of 100 objects, each photographed from various angles.

Omniglot: A dataset designed for one-shot learning, containing 1623 different handwritten characters from 50 different alphabets.

STL10: Inspired by CIFAR-10, this dataset is meant for developing unsupervised feature learning, deep learning, and self-taught learning algorithms.

EMNIST (Extended MNIST): Expands the original MNIST dataset to include handwritten letters.

Task Breakdown Integration of Datasets: Implement the integration of these datasets into the DLP system, ensuring they are easily accessible and usable for users.

Architecture Optimization: For each dataset, research and determine the most effective neural network architectures that are suitable for testing. This involves understanding the specific characteristics and challenges posed by each dataset.

Documentation and Examples: Provide detailed documentation and example use cases for each dataset, guiding users on how to leverage these datasets effectively.

Testing and Validation: Conduct thorough testing (through POSTMAN + default dataset) to ensure the seamless integration of these datasets into the DLP. Validate the performance of suggested architectures for each dataset. More info on how to do this is in Notion.

github-actions[bot] commented 1 year ago

Hello @codingwithsurya! Thank you for submitting the Feature Request Form. We appreciate your contribution. :wave:

We will look into it and provide a response as soon as possible.

To work on this feature request, you can follow these branch setup instructions:

Checkout the main branch:
```
 git checkout nextjs
```
Pull the latest changes from the remote main branch:
```
 git pull origin nextjs
```
Create a new branch specific to this feature request using the issue number:
```
 git checkout -b feature-1058
```
Feel free to make the necessary changes in this branch and submit a pull request when you're ready.

Best regards, Deep Learning Playground (DLP) Team

LuHG18 commented 9 months ago

Found a small bug in tabularConstants.ts where there was a typographical error for the california housing data set leading to incorrect referencing. Just adding an underscore between "california" and "housing" fixed the problem.

LuHG18 commented 9 months ago

Another small bug. The DIGITS data set was not working when selected. This data set had just not been loaded in from sci-kit learn. I loaded it in and added it in dataset.py and that seems to have fixed the problem.

karkir0003 commented 9 months ago

Hey @LuHG18 ETA on the PR?

DSGT-DLP / Deep-Learning-Playground