[EXE] A simple exercise to understand stochastic gradient descent

gimseng commented 3 years ago

In particular, help a learner to learn full gradient descent, mini-batch stochastic gradient descent and etc.

It could be on linear regression or some simple neural network or etc.

But should focus on understanding the differences between the different gradient descent approaches, pros and cons, how fast/slow learning, how about data-set size and etc?

anandravishankar12 commented 3 years ago

@gimseng Can I start working on this? Would be glad to assist in any way possible.

gimseng commented 3 years ago

@anandravishankar12 absolutely, go ahead ! If you have any questions or need any help, let us know here or in discord chat. Many thanks !

ashishthanki commented 3 years ago

Hi @anandravishankar12 @gimseng

I would like to help with this too. Perhaps we can share the workload.

gimseng commented 3 years ago

Sure @anandravishankar12 and @ashishthanki , i'll let both of you coordinate/collaborate on this. You can the discord channel if you want. Thanks for contributing !

Rajwrita commented 3 years ago

https://github.com/Rajwrita/Grokking-Machine-Learning/blob/master/Gradient%20descent%20method%20for%20linear%20regression.ipynb

Do you feel this will help? I could make a PR adding this notebook then. Thanks :)

angelinjohn commented 3 years ago

Is this still open for contribution? Would like to work on it

gimseng commented 3 years ago

Hi @anandravishankar12 @ashishthanki @Rajwrita and @angelinjohn:

Thanks all for the interest. Could you guys elaborate how / which part of SGD you want to work on? I could envision different data set, different models (standard ML or deep models), and diff SGD (full GD, mini-batch SGD and etc) and maybe some discussions of learning rate. I think there could be 1-3 projects delving deeper in to this.

DaggerAJ commented 2 years ago

@gimseng I wanted to contribute to this 'A simple exercise to understand stochastic gradient descent' . My approach is to make dataset with make classification and then apply logistic regression with l2 reg using sgd from sklearn on the dataset and then apply sgd from scratch on the same dataset . Then will compare both model loss .

TerMinator-spec commented 2 years ago

Hey, I want to work on this issue I will implement the full gradient descent and batch gradient descent from scratch on a simple regression problem. Shall I go ahead?

M-S-10 commented 2 years ago

Hey there! umm, I wonder why this issue isn't closed yet? I could help!

Blaize99 commented 1 year ago

hey i am interested to take this issue. please assign me this

AnuravModak commented 10 months ago

Hello @gimseng ,

I'm excited to contribute to this "good first issue" and help learners understand stochastic gradient descent better. I would be interested in creating an educational resource that focuses on the differences between various gradient descent approaches, their pros and cons, and how they perform under different circumstances. Specifically, I'd like to explore the concepts using a simple neural network in Python.

My plan is to create a step-by-step tutorial that covers:

Introduction to Gradient Descent: Explaining the fundamental concept of gradient descent and its role in optimizing machine learning models.

Full Gradient Descent: Detailing the process of full gradient descent, how it works, its advantages, and its limitations. I will provide code examples to implement this approach on a basic linear regression problem.

Mini-Batch Stochastic Gradient Descent: Exploring the concept of mini-batch SGD, its benefits in terms of computational efficiency, and its trade-offs. I'll walk through the implementation on both a linear regression task and a simple neural network.

Learning Rate and Convergence: Discussing the importance of learning rate in gradient descent, its impact on convergence, and strategies for selecting an appropriate learning rate for different scenarios.

Comparative Analysis: Comparing the performance of full gradient descent and mini-batch SGD on different datasets, including small and large datasets. I'll provide visualizations to help learners understand the differences in convergence speed and accuracy.

Discussion: Engaging in a brief discussion about when to use each approach, considering factors like dataset size, model complexity, and computational resources.

For datasets, I propose using:

Simple Linear Regression Dataset: A synthetic dataset with a single feature and target variable, suitable for illustrating gradient descent concepts.

Iris Dataset: A classic dataset for multiclass classification. This will allow us to explore mini-batch SGD on a more complex problem.

MNIST Dataset: A widely-used dataset for handwritten digit recognition. We can use this dataset to showcase the effects of dataset size on different gradient descent approaches.

I'm open to suggestions and would appreciate any feedback on this plan. Looking forward to contributing and helping learners grasp these important concepts.

Best regards, Anurav Modak

gimseng / 99-ML-Learning-Projects

[EXE] A simple exercise to understand stochastic gradient descent #43