bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

84 mixed precision support #85

Closed bclarkson-code closed 1 month ago

bclarkson-code commented 1 month ago

closes #84

Stabilised mixed precision training with loss scaling. Thanks to @kddubey for pointing out that it was needed.