Open fxnnxc opened 10 months ago
Tensor Size: $L \times N \times T \times D$ Layers, Samples, Tokens, Dimensions
Ex) 5, 6330, 40, 512
For all models, only $D$ is different
llama1=('llama2' '7b' '[0,8,16,24,32]')
llama2=('llama2' '13b' '[0,8,16,24,40]')
llama3=('llama2_chat' '7b' '[0,8,16,24,32]')
llama4=('llama2_chat' '13b' '[0,8,16,24,40]')
pythia1=('70m' '[0,1,3,5,6]')
pythia2=('160m' '[0,2,4,8,12]')
pythia3=('410m' '[0,4,8,16,24]')
pythia4=('1b' '[0,2,4,8,16]')
pythia5=('1.4b' '[0,4,8,16,24]')
pythia6=('2.8b' '[0,4,8,16,32]')
pythia7=('6.9b' '[0,4,8,16,32]')
pythia8=('12b' '[0,4,8,16,36]')
Directory Structure
Memory Size