GDS-Education-Community-of-Practice / DSECOP

This repository contains data science educational materials developed by DSECOP Fellows.
Creative Commons Zero v1.0 Universal
43 stars 25 forks source link

Ashley - Connecting_MonteCarlo_to_ModernAI #22

Closed soltaniehha closed 1 year ago

soltaniehha commented 1 year ago

Module under folder Connecting_MonteCarlo_to_ModernAI

cnrrobertson commented 1 year ago

I know that the main focus of the modules is to complete them in Colab, but it could be nice to have a requirements.txt for pip or an environment.yml for anaconda with the packages that would be needed for the module.

cnrrobertson commented 1 year ago

Hey @daleas0120, I've finished my pass through notebook 1. Looks great overall. Very nice intro to Monte carlo stuff and I think the exercises are the perfect difficulty. I tried to leave specific comments on places that might need just a tad bit more explanation or a little adjustment. It took me about an hour. Mostly spent on reading.

cnrrobertson commented 1 year ago

Hey @daleas0120, I have gone through notebook 2 and it's awesome. I love that you included a bit of code profiling because you can very easily see the big gains just by using numba and it is a very useful tool for the students. It took me probably an hour to go through but mostly spent on the reading.

cnrrobertson commented 1 year ago

Hey @daleas0120, I have gone through notebook 3. It's really awesome. I don't have any specific spots, but I generally really like the high level overviews you give of the methods and the way you let students just try running things and keeping track of the results. I'll be honest, I didn't do the full training for the CNN because it does take a bit!

My only concern is that the CNN may be a bit overwhelming without better understanding of neural networks and that the long training time will be prohibitive of the students finishing. To address that, maybe an idea could be:

  1. Shorten the "Build a CNN model" description to just give the bare bones info on the layers of the network and then point them to an appendix with the full details as you have them written.
  2. You are using a very small learning rate in Adam. Is that needed? I tried with a much larger rate (0.05) and the training after 1 epoch gave a test MSE of 0.01. Maybe you could start them with a larger rate and then also include in the optional that they can try with a smaller rate and more epochs if they'd like to get really good results.
daleas0120 commented 1 year ago

Hey @daleas0120, I have gone through notebook 3. It's really awesome. I don't have any specific spots, but I generally really like the high level overviews you give of the methods and the way you let students just try running things and keeping track of the results. I'll be honest, I didn't do the full training for the CNN because it does take a bit!

My only concern is that the CNN may be a bit overwhelming without better understanding of neural networks and that the long training time will be prohibitive of the students finishing. To address that, maybe an idea could be:

1. Shorten the "Build a CNN model" description to just give the bare bones info on the layers of the network and then point them to an appendix with the full details as you have them written.

Thank you for your comment. I have moved the Model Intuition information to an appendix as suggested, and created an in-text hyperlink to the content.

2. You are using a very small learning rate in Adam. Is that needed? I tried with a much larger rate (0.05) and the training after 1 epoch gave a test MSE of 0.01. Maybe you could start them with a larger rate and then also include in the optional that they can try with a smaller rate and more epochs if they'd like to get really good results.

Thank you for your comment. Yes, the small learning rate is required to adequately capture the thermal noise in the data and not have the gradient collapse. When using a larger learning rate, the MSE quickly stops improving after a few epochs, and plotting the predictions shows that the model has not learned the data. Learning rates from 1E-2 to 1E-9 were tested with various other hyperparameter combinations over a two-week period in order to determine the easiest initial configuration for the students; the MSE drops to 1E-2 in under 10 minutes for LR=1E-8. When using the GoogleColab GPU, the total time to train took about 1hr 10 minutes. Since I warn the students at the beginning of the notebook and again at the beginning of the exercise that it may take several hours, I feel comfortable leaving them with the setup as it is. I have also left a note to the instructors in the solutions rubric that, depending on the computational resources available, adjustments may need to be made. Finally, the end of notebook questions focus on asking the students to understand the CNN algorithm instead of optimizing it.