Implementation of A-pipeline for True-False Dataset (gather hiddens) - Githubissues

fxnnxc / probe_lm

GNU General Public License v3.0

0 stars 0 forks source link

Implementation of A-pipeline for True-False Dataset (gather hiddens) #2

Open fxnnxc opened 10 months ago

fxnnxc commented 10 months ago

Directory Structure

- outputs 
   - a_pipeline
       -  llama2_13b
       -  llama2_7b 
       - ....

Memory Size

85G     a_pipeline/

13G     llama2_13b
9.7G    llama2_7b
13G     llama2_chat_13b
9.7G    llama2_chat_7b
13G     pythia_12b
4.9G    pythia_1.4b
1.9G    pythia_160m
4.9G    pythia_1b
6.1G    pythia_2.8b
2.5G    pythia_410m
9.7G    pythia_6.9b
1.3G    pythia_70m

fxnnxc commented 10 months ago

Hidden Tensor Size

Tensor Size: $L \times N \times T \times D$ Layers, Samples, Tokens, Dimensions

Ex) 5, 6330, 40, 512

For all models, only $D$ is different

Layer Information

0 : after token embedding
1 : the first layer output
last : the last layer

llama1=('llama2' '7b' '[0,8,16,24,32]')
llama2=('llama2' '13b' '[0,8,16,24,40]')
llama3=('llama2_chat' '7b' '[0,8,16,24,32]')
llama4=('llama2_chat' '13b' '[0,8,16,24,40]')

pythia1=('70m' '[0,1,3,5,6]')
pythia2=('160m' '[0,2,4,8,12]')
pythia3=('410m' '[0,4,8,16,24]')
pythia4=('1b' '[0,2,4,8,16]')
pythia5=('1.4b' '[0,4,8,16,24]')
pythia6=('2.8b' '[0,4,8,16,32]')
pythia7=('6.9b' '[0,4,8,16,32]')
pythia8=('12b' '[0,4,8,16,36]')