LxMLS / lxmls-toolkit

Machine Learning applied to Natural Language Processing Toolkit used in the Lisbon Machine Learning Summer School
Other
223 stars 215 forks source link

Transformer Day: Create Transformer Exercises #178

Open ramon-astudillo opened 1 year ago

ramon-astudillo commented 1 year ago

Objective: Create exercises where students have to complete miniGPT code

we can start the discussion here @israfelsr @robertodessi , based on https://github.com/karpathy/minGPT/blob/master/mingpt/model.py

Branch: https://github.com/LxMLS/lxmls-toolkit/tree/transformer-day-student

NOTE: ⬆️ since we are going to delete parts, this goes in a separate branch that is merged with student. You can pull updates from transformer-day.

Expected Finishing date:

israfelsr commented 1 year ago

Hi guys! I've been exploring some resources and brainstorming exercise ideas. I like the concept of having three levels. Here are my thoughts:

  1. Warm-Up exercise: I suggest plotting trained attention weights on a sentence. This exercise can involve trying different sentences or explaining the behavior of specific attention heads on different tokens. I've created a small code using BERT and transformers (HF) to plot the heatmaps. We can improve the plot to make it more intuitive!

  2. Core Exercise: I propose coding the complete Attention module. We can begin with some theoretical exercises, such as determining the dimensions of Q, K, and V for different input scenarios, or discussing the importance of this projection. For the coding part, we can provide dimension hints to guide the implementation. Additionally, we can ask to calculate the number of parameters for a given configuration.

  3. Training and Prompting: Rather than solely loading a pre-existing architecture, maybe we could start creating a small network and training it. We can give instructions to build a network with specific layers and parameters, and then train it using provided data. We can structure this exercise step-by-step with open instructions: loading the data, creating the model, and training it. We can compare this small model with a pretrained one and even ask about the relation performance/amount of weights.

Let me know what do you think! I have an starting code for the first one that we can improve, for the second one I think we can use the miniGPT module and then for the last one I can start checking some small architectures if you think it's a good idea.

pedrobalage commented 1 year ago

Here are some suggestions from my view on the topic:

Context

I like to walk back on what we want the students need to learn from the exercices. My suggestions are:

Suggestions for the exercices:

What we are not going to cover:

robertodessi commented 1 year ago

Can we close this?