carpentries-incubator / deep-learning-intro

Learn Deep Learning with Python
https://carpentries-incubator.github.io/deep-learning-intro/
Other
31 stars 36 forks source link

Updated slides #478

Open qualiaMachine opened 5 months ago

qualiaMachine commented 5 months ago

I have some updated slides that I used to teach this lesson last week: https://docs.google.com/presentation/d/1uT4uvfWrpvrrQEFp84PGfAQ2r9Ylqx8tbwiVFuGfEao/edit?usp=sharing

Please feel free to use/repurpose anything in there.

I felt it was important to comment on the double descent phenomenon during the discussion of "how much data is needed?", especially in the age of increasingly large language models. Double descent is not currently mentioned in the lesson. I may make a pull request on the topic if I can find the time... it's something we may want to add to an earlier episode.

svenvanderburg commented 5 months ago

Thanks for sharing @qualiaMachine ! We'll think about what to do with slides, since it doesn't make much sense if everyone develops their own slides. I didn't actually know about double descent, thanks for teaching me! Is it something that you come across frequently in practice?

qualiaMachine commented 5 months ago

Glad I could share! It's something that hasn't really been discussed much up until a few years ago. Older textbooks still need to be updated since the classic bias-variance tradeoff is violated with deep neural networks! I have personally never experienced it, but I have worked with fairly small datasets relative to other deep learning applications. Evidently double descent is more frequently observed when you have larger datasets... at least 10,000 observations which I never encountered in my research applications :(. Many other learners may be in a similar boat, but I still think it's worthwhile to point out. I usually talk about it in the context of large language models, which despite having billions/trillions of weights, can still avoid overfitting. It's also something that's worth mentioning when early stopping is implemented. While in general, I would recommend sticking with early stopping, those with large datasets may want to explore "overparameterized" models to see if they can reach past the initial overfitting phase.

The book I recommended has a chapter on it if you're curious to learn more: https://udlbook.github.io/udlbook/.

Here are a couple other references that are worth checking out:

svenvanderburg commented 5 months ago

Cool, thanks for the clear explanations! 🙏