alan-turing-institute / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
MIT License
1 stars 0 forks source link

Add different activation functions #8

Open rchan26 opened 2 months ago

rchan26 commented 2 months ago

The codebase currently use Gaussian Error Linear Units as the activation function (as opposed to ReLU).

We can implement different activation functions to compare and also newer ones like SwiGLU used in models like Llama 3.1. See GLU Variants Improve Transformer