bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

Optimise embedding #58

Closed bclarkson-code closed 3 months ago

bclarkson-code commented 3 months ago

Currently, the embedding layer uses more memory and is slower than I would expect. It should be optimised

bclarkson-code commented 3 months ago

I ran a benchmark on the new code and the speedup is huge! ~8.5x faster

┏━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Benchmark ┃ Min ┃ Max ┃ Mean ┃ Min (+) ┃ Max (+) ┃ Mean (+) ┃ ┡━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ original │ 0.677 │ 0.744 │ 0.692 │ 0.080 (8.5x) │ 0.081 (9.2x) │ 0.080 (8.6x) │ └───────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘