gimseng / 99-ML-Learning-Projects

A list of 99 machine learning projects for anyone interested to learn from coding and building projects
MIT License
576 stars 174 forks source link

Extended MNIST handwriting character recognition #102

Open t3bol90 opened 3 years ago

t3bol90 commented 3 years ago

I add the Extended MNIST handwriting character recognition.

This work is from my university-project with my friend and I think that is good for beginner in ML/DL.

The dataset is take from Kaggle API - so the learner can learn how to use Kaggle API to crawl dataset to local.

The solution presents 3 models (Vanilla CNN, LeNet-5 modified and ResNet) to solve the problem of handwritten character classification with the EMNIST dataset. Some attempts, with the help from a combination of research papers and our personal experiences/insights, were made at improving the performance of said models, including modified preprocessing pipeline and additional layers in the network architecture. Basic methods such as confusion matrix and plot of accuracy through time was used to assess the performance of said models, with the accuracy of Vanilla CNN, LeNet-5 and ResNet being 88.25%, 89.61% and 82.60% respectively. While this might not (and cannot) be the state-of-the-art solution in its domain, it gave the learner the initial cursory knowledge with a prompt to research deeper in the future.

gimseng commented 3 years ago

Hi @t3bol90 thanks for the contribution and the comprehensive codes. I'll look at them more carefully later this week, for now, there's a few things to keep in mind:

  1. Could you rename your folder name to something like 009 or 010? This is a placeholder name so that we won't have to deal with conflict of recently added projects.

  2. Could you comment on where the data source (EMNIST) is from? Should we copy/download it to a data folder or its available through a library?

  3. There are two files without file extensions: LeNet-5 version 2 and Vanilla CNN. Are these folders or text files or perhaps some other type of files?

t3bol90 commented 3 years ago

So,

  1. I have rename the folder to 010 :3.
  2. The dataset is crawl from Kaggle - I've add the quick script to get the dataset to your local device or colab enviroment.
  3. 2 files is ipython notebooks but I fogot to rename it on my own local device. I've renamed.

Thanks for your feedbacks <3

gimseng commented 3 years ago

Hi @t3bol90 thanks for the updates.

  1. Thanks !

  2. I am still very confused about the data. Could you just put them in a data folder? See other projects.

  3. Thanks for the clarifications.

Suggestions:

  1. The readme.md for solution is too long, please summary it. I'd suggest having another .md file to have the long and detailed explanations.

  2. I'm still not sure about this kaggle token thing. Is it possible to structure everything (see above for data source) such that we have a self-contained project folder with data and notebooks? I envision this will be helpful for beginners that have not dealt with kaggle before.

Finally, I think its great that you are being comprehensive about the explanations (in the solution readme) and having pretty nice notebooks. I'll need some time (maybe early next week) to run through things, once the data source has been properly incorporated. Again, thanks !