Thoughts on adding a hyperparameter tuning episode

qualiaMachine commented 1 year ago

I recently taught this workshop at UW-Madison and it went very well! We had many requests from learners to dive a bit deeper into a concrete example of hyperparameter tuning. I think a whole episode could probably be devoted to this (maybe right after "monitor the training process"). What do others think?

I don't have time to implement the full episode just yet, but might get around to it in a month when my schedule opens up. Here is a short colab script I whipped up that goes over a quick example of tuning using a for-loop: https://colab.research.google.com/drive/1B_F5ik6P9hfbKT2pzfMgsvFeF99_Uy9V?usp=sharing. If anyone wants to borrow this as a starting point to implement the episode, please feel free.

Note: I usually use GridSearchCV for hyperparameter tuning, but apparently it can be tricky to integrate with Keras neural networks. That or I just need to dig a little deeper.

colinsauze commented 1 year ago

Both this and issue #349 seem like interesting topics which I think many people would find useful. But they open up the very broad question of how long do we want this workshop to be?

How long did you take to teach it? and how much more time could you have found and would your learners be interested in spending?

We could potentially try to minimise interdependency between some episodes to allow some instructors to not teach them if they are restricted to one day of teaching.

qualiaMachine commented 1 year ago

You raise a good point. We taught this in a 3-day schedule from 8:30am-12:30pm (roughly ~3.5 hrs of actual instruction daily accounting for breaks).

Here is the breakdown of the schedule...

Day 1: Intro & Classification by a Neural Network,
Day 2: Monitor the Training Process
Day 3: Advanced Layer Types

We ended about an hour early on day 3 so it might be pretty painless to add transfer learning to that section (assuming a 3-day/4-hr schedule).

As for hyperparameter tuning, that section would likely take at least an hour to really dive into the process. That might push things into the 4-day schedule range. As a middle ground, we could maybe have the learners call a pre-made cross-validation function and explain what it is doing briefly. Alternatively, we prep a whole hyperparameter tuning episode more or less as an optional episode for instructors to choose if they'd like.

svenvanderburg commented 1 year ago

I would suggest to rephrase things in this section a little bit: https://carpentries-incubator.github.io/deep-learning-intro/03-monitor-the-model/index.html#counteract-model-overfitting

Because here we are actually doing hyperparamter tuning (we are changing a hyperparameter). It is only a small extra step to do this within a for-loop. We could have an infobox that discusses hyperparameter tuning in a for-loop, and have a callout to grid search. (In practice this is what we often tell students if they ask about hyperparamter tuning)

@qualiaMachine you teach fast! We barely finish in 4 mornings ... Any tips for speeding up?

@colinsauze good point on the length of the material. I think the length is good now, but I think the workshop could benefit from extra optional episodes. Although in this case it feels better to incorporate it into the existing episoded and I don't think it adds that much extra teaching time.

qualiaMachine commented 1 year ago

@svenvanderburg I like that suggestion. I think ending with a for loop (or gridsearch) in episode 2 is a nice way to tie things back to how they are usually done in practice. My hope is that this would add only around ~15 minutes of additional instruction time.

For timing tips, I can mostly just speculate since I only taught this once before. Here are a few factors that might have come into play.

Having a strong intro section/slides may help reduce questions in the later sections. My slides attempted to clarify a few sticking points early on including...
- What is a batch?
- Why do we train on multiple epochs?
- What is the purpose of having multiple layers vs. having a massive single layer (universal approx. theorem)?
Great instructors and helpers. Any questions that were brought up were quickly answered. We also had helpers post clarifying messages to chat (online workshop) throughout to help avoid some questions.
Minimal time spent resolving setup issues. We hit the ground running by sending enough reminders about the setup instructions and offering office hours in advance of the workshop for additional help. Even with this safeguards, we had a couple of folks show up unprepared (as usual xD). Those folks were able to transition to colab quickly.
We may have rushed the Advanced Layers episode a bit (new instructor) which has me thinking that @svenvanderburg's suggestion of making Transfer learning an optional episode is probably the best way to go (see #349).

svenvanderburg commented 1 year ago

Thanks for the tips @qualiaMachine 🙏

I created a new issue for adding hyperparameter tuning #351, I'm closing this issue since I think we have reached a conclusion (of course we can always reopen if someone disagrees haha). Feel free to pick up #351 if you have time @qualiaMachine .

carpentries-incubator / deep-learning-intro

Thoughts on adding a hyperparameter tuning episode #348