Closed klei22 closed 3 weeks ago
Modified bench.py to profile softmaxes vs context length, sharing some prelim traces:
this also supports chrome://trace
Showing more granular layer to layer comparison when the json file is loaded to chrome's profile viewer:
Added option to just benchmark the forward pass, which saved considerable memory and allowed us to test from 4096 with 24GB VRAM to 8196 with 24GB of VRAM, capturing the quadratic increase in latency.
Modified bench.py to profile softmaxes vs context length, sharing some prelim traces:
this also supports chrome://trace
Showing more granular layer to layer comparison when the json file is loaded to chrome's profile viewer: