Add softmax sweep to benchmark softmaxes vs context

ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

MIT License

23 stars 17 forks source link

Add softmax sweep to benchmark softmaxes vs context #245

Closed klei22 closed 3 weeks ago

klei22 commented 3 weeks ago

Modified bench.py to profile softmaxes vs context length, sharing some prelim traces:

forward_pass_timing_plot ln1_ln2_timing_plot

this also supports chrome://trace

Showing more granular layer to layer comparison when the json file is loaded to chrome's profile viewer: Screenshot from 2024-08-25 02-05-09

klei22 commented 3 weeks ago

Added option to just benchmark the forward pass, which saved considerable memory and allowed us to test from 4096 with 24GB VRAM to 8196 with 24GB of VRAM, capturing the quadratic increase in latency.

forward_pass_timing_plot