bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

Fused operations #51

Closed bclarkson-code closed 3 months ago

bclarkson-code commented 3 months ago

A lot of time is spent in MLP blocks and attention blocks. The operations in these blocks can be fused to reduce both memory usage and latency