bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

Rotary Position Embeddings #72

Open bclarkson-code opened 2 months ago

bclarkson-code commented 2 months ago

Modern LLMs basically all use rotary position embeddings. these should be added to tricycle