Open karkir0003 opened 2 years ago
Features to be added:
Option | Values |
---|---|
Problem Type | Image Classification |
Criterion | - Cross Entropy - MSE - Weighted Cross Entropy Loss |
Default Datasets | - MNIST (Digits) - FashionMNIST - CIFAR-10 - Blood Cell Images |
Train transform | All possible pytorch transform |
Test transform | All possible pytorch transform |
Layers | - Conv2d - MaxPool2d - AdaptiveAvgPool2d - Dropout - BatchNorm2d - Already present in tabular |
Optimizer | - SGD - Adam |
Epochs | User input (numerical) |
Shuffle | True / False |
Batch size | User input (numerical) |
If user wants to input a file, it could be a zipped file having a train folder and a valid folder, with subfolders storing the class names OR a train folder and valid folder with images in them and a csv file storing file name with its corresponding class.
Option | Values |
---|---|
Name | Name of model |
Default dataset | similar to above |
Criterion | similar to above |
Cut | user input (numerical) |
Train transform | similar to above |
Test transform | similar to above |
Optimizer | similar to above |
Epochs | similar to above |
Shuffle | similar to above |
Batch size | similar to above |
Taking inspiration from tabular data (Home page), we can have train and valid transforms below layers inventory. Transforms can either be in a drag and drop format, the transform inventory displaying the most used transforms, or the user can search for transforms in a list and on clicking a option, a block with that transform name can appear, or we can do both? The reason for the list is that there are a lot of transforms one can apply, and it might looked cluttered to show all.
We can have a toggle button next to the heading (Implemented layers) to switch from drag and drop to pretrained. A similar button could be placed, once pretrained layout is determined.
Feel free to edit this comment. There need to be more layers as well. Once completed with the entire implementation we can add more features like
When people are uploading large image datasets, we may have to look into chunking in Flask which allows us to send the image data to the server via multiple requests rather than one massive, bloated request which may cause issues.
@avayedawadi , can you give an example of an approach that uses chunking or can we take advantage of a library of some sort to do this
For Kaggle dataset support: https://www.kaggle.com/docs/api
You can create a dummy account, generate API key, configure AWS Secrets manager to store the dummy kaggle acct username + API key. Then, your code pulls those API keys from AWS secrets manager and then runs kaggle download dataset command
Make sure ecsTaskExecutionRole in our deployment has full access to AWS Secrets Manager
I'll work on chunking with Flask. @farisdurrani Are you good with using chunking for large file upload?
Never needed to do chunking before, so this is new territory for me as well
I think it's the best solution. https://stackoverflow.com/questions/44727052/handling-large-file-uploads-with-flask This is a Stack Overflow post that shows how to implement chunking. The other alternative is streaming the data but there is not a lot of documentation and help surrounding the streaming solution so personally, I think chunking is the way to go.
https://stackoverflow.com/questions/32898082/splitting-a-file-into-chunks-with-javascript SO post for implementing chunking with pure js code (no extra libraries needed), compatible with React as well. Still need to figure out how to stream all the chunks to backend AND merge those chunks
I pushed a potential solution for the large file upload to img-cnn-frontend
. The idea is that putting multipart/form-data should automatically divide the data into multiple parts when uploading. Then we have threading in the backend so that no processes freeze up when saving everything. It ticks the box of efficient large file uploading without freezing anything. It works efficiently on my machine but I don't know if it is a valid solution in deployment or even on other people's machines. Let me know if there is any advice!
How do I upload a file in Image Model? And do you have a preferred test case file we can all use for reproducibility?
The upload button is lower. I put it below the email input on the main landing page kind of randomly just for testing purposes. I'll move it once we're happy with it. Any dataset from: https://www.kaggle.com/search?q=image+datasetSize%3Alarge would be good because they are sufficiently large for testing, @farisdurrani @vidushiMaheshwari
Describe the solution you'd like Frontend portion of image classification
Similar in look to homepage, but in image classification, user should be able to upload zipped image data or enter url to the zip User should be able to drag and drop the preprocessing transforms to be applied on train and validation sets User should be able to drag and drop the model architecture for their image classifier
Layers to be added: Conv2d, MaxPool2d, AdaptiveAvgPool2d, Dropout, BatchNorm2d + layers available in existing home page (any other image classification related layers that are important can be added)