dingo-actual / infini-transformer

PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)
MIT License
271 stars 22 forks source link

Add multiple options for nonlinear activation #11

Closed rtaylor-rx-m closed 4 months ago

rtaylor-rx-m commented 4 months ago

The MLPs in both transformer modules currently have ReLU hard-coded as the activation function. It would help to have options for nonlinear activations commonly used in recent LLMs (GeLU, SwiGLU, GeGLU, etc.)

dingo-actual commented 4 months ago

Implemented.