EleutherAI / project-menu

See the issue board for the current status of active and prospective projects!
65 stars 4 forks source link

Plot training history of hidden representations in GPT #7

Closed Gurkenglas closed 1 year ago

Gurkenglas commented 3 years ago

How quickly do different layers of GPT learn what they learn? Fix a test input and a point in time. The network calculates an activation vector per layer. (Then it's added onto that layer's input via skip connection.) Fix a layer and vary the point in time. Plot the path that this vector takes, perhaps as a heatmap of the dot product of any two snapshots. Do talk to Gurkenglas for rambling. This project should take a master developer about as much time as it took me to write this issue.