bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
107 stars 9 forks source link

19 swiglu implementation is incorrect #20

Closed bclarkson-code closed 7 months ago

bclarkson-code commented 7 months ago

Added an optionally tunable bias term to swiglu