f-dangel / backpack

BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.
https://backpack.pt/
MIT License
562 stars 56 forks source link

Compatibility with timm models? #329

Closed prafful-kumar closed 3 months ago

prafful-kumar commented 3 months ago

When I am trying to find the GSNR value using the backpack library with models built on timm, I am facing these errors. How can I solve these?

GOAL: I aim to calculate GSNR values for adapter parameters (used in the foundation model) over epochs.

/scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to variance does not have an extension for Module <class 'timm.models.layers.mlp.Mlp'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to sum_grad_squared does not have an extension for Module <class 'timm.models.layers.mlp.Mlp'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to grad_batch does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Attention'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to variance does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Attention'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to sum_grad_squared does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Attention'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to grad_batch does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Block'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to variance does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Block'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to sum_grad_squared does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Block'> although the module has parameters

f-dangel commented 3 months ago

Hi, thanks for your question. It looks like you have called backpack.extend() on layers that contain parameters but are not supported in BackPACK. The warnings suggest these are container modules and therefore don't need to be extended to get the quantities required for GSNR to work.

Could you post an overview of the layers your networks contains?

prafful-kumar commented 3 months ago

Hi, thank you for your quick response. I am attaching the model structure and trainable layers.

model summary: DistributedDataParallel( (module): VisionTransformer( (patch_embed): PatchEmbed( (proj): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16)) (norm): Identity() ) (pos_drop): Dropout(p=0.0, inplace=False) (blocks): Sequential( (0): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): Identity() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (1): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (2): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (3): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (4): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (5): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (6): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (7): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (8): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (9): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (10): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (11): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) ) (norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (pre_logits): Identity() (head): Linear(in_features=768, out_features=100, bias=True) (drop_prompt): Dropout(p=0.1, inplace=False) ) )

trainable parameters: module.visual_prompt_token module.visual_prompt_token_pos_embed module.blocks.0.visual_prompt_token module.blocks.0.attn.prefix_tokens_key module.blocks.0.attn.prefix_tokens_value module.blocks.0.attn.LoRA_a.weight module.blocks.0.attn.LoRA_b.weight module.blocks.0.adapter.sampled_bias_1 module.blocks.0.adapter.ln1.weight module.blocks.0.adapter.ln1.bias module.blocks.0.adapter.ln2.weight module.blocks.1.visual_prompt_token module.blocks.1.attn.prefix_tokens_key module.blocks.1.attn.prefix_tokens_value module.blocks.1.attn.LoRA_a.weight module.blocks.1.attn.LoRA_b.weight module.blocks.1.adapter.sampled_bias_1 module.blocks.1.adapter.ln1.weight module.blocks.1.adapter.ln1.bias module.blocks.1.adapter.ln2.weight module.blocks.2.visual_prompt_token module.blocks.2.attn.prefix_tokens_key module.blocks.2.attn.prefix_tokens_value module.blocks.2.attn.LoRA_a.weight module.blocks.2.attn.LoRA_b.weight module.blocks.2.adapter.sampled_bias_1 module.blocks.2.adapter.ln1.weight module.blocks.2.adapter.ln1.bias module.blocks.2.adapter.ln2.weight module.blocks.3.visual_prompt_token module.blocks.3.attn.prefix_tokens_key module.blocks.3.attn.prefix_tokens_value module.blocks.3.attn.LoRA_a.weight module.blocks.3.attn.LoRA_b.weight module.blocks.3.adapter.sampled_bias_1 module.blocks.3.adapter.ln1.weight module.blocks.3.adapter.ln1.bias module.blocks.3.adapter.ln2.weight module.blocks.4.visual_prompt_token module.blocks.4.attn.prefix_tokens_key module.blocks.4.attn.prefix_tokens_value module.blocks.4.attn.LoRA_a.weight module.blocks.4.attn.LoRA_b.weight module.blocks.4.adapter.sampled_bias_1 module.blocks.4.adapter.ln1.weight module.blocks.4.adapter.ln1.bias module.blocks.4.adapter.ln2.weight module.blocks.5.visual_prompt_token module.blocks.5.attn.prefix_tokens_key module.blocks.5.attn.prefix_tokens_value module.blocks.5.attn.LoRA_a.weight module.blocks.5.attn.LoRA_b.weight module.blocks.5.adapter.sampled_bias_1 module.blocks.5.adapter.ln1.weight module.blocks.5.adapter.ln1.bias module.blocks.5.adapter.ln2.weight module.blocks.6.visual_prompt_token module.blocks.6.attn.prefix_tokens_key module.blocks.6.attn.prefix_tokens_value module.blocks.6.attn.LoRA_a.weight module.blocks.6.attn.LoRA_b.weight module.blocks.6.adapter.sampled_bias_1 module.blocks.6.adapter.ln1.weight module.blocks.6.adapter.ln1.bias module.blocks.6.adapter.ln2.weight module.blocks.7.visual_prompt_token module.blocks.7.attn.prefix_tokens_key module.blocks.7.attn.prefix_tokens_value module.blocks.7.attn.LoRA_a.weight module.blocks.7.attn.LoRA_b.weight module.blocks.7.adapter.sampled_bias_1 module.blocks.7.adapter.ln1.weight module.blocks.7.adapter.ln1.bias module.blocks.7.adapter.ln2.weight module.blocks.8.visual_prompt_token module.blocks.8.attn.prefix_tokens_key module.blocks.8.attn.prefix_tokens_value module.blocks.8.attn.LoRA_a.weight module.blocks.8.attn.LoRA_b.weight module.blocks.8.adapter.sampled_bias_1 module.blocks.8.adapter.ln1.weight module.blocks.8.adapter.ln1.bias module.blocks.8.adapter.ln2.weight module.blocks.9.visual_prompt_token module.blocks.9.attn.prefix_tokens_key module.blocks.9.attn.prefix_tokens_value module.blocks.9.attn.LoRA_a.weight module.blocks.9.attn.LoRA_b.weight module.blocks.9.adapter.sampled_bias_1 module.blocks.9.adapter.ln1.weight module.blocks.9.adapter.ln1.bias module.blocks.9.adapter.ln2.weight module.blocks.10.visual_prompt_token module.blocks.10.attn.prefix_tokens_key module.blocks.10.attn.prefix_tokens_value module.blocks.10.attn.LoRA_a.weight module.blocks.10.attn.LoRA_b.weight module.blocks.10.adapter.sampled_bias_1 module.blocks.10.adapter.ln1.weight module.blocks.10.adapter.ln1.bias module.blocks.10.adapter.ln2.weight module.blocks.11.visual_prompt_token module.blocks.11.attn.prefix_tokens_key module.blocks.11.attn.prefix_tokens_value module.blocks.11.attn.LoRA_a.weight module.blocks.11.attn.LoRA_b.weight module.blocks.11.adapter.sampled_bias_1 module.blocks.11.adapter.ln1.weight module.blocks.11.adapter.ln1.bias module.blocks.11.adapter.ln2.weight module.head.weight module.head.bias

f-dangel commented 3 months ago

Could you only show the modules that have parameters and no sub-modules? Something like

[mod for mod in net.children() if not list(mod.children()) and list(mod.parameters()]
prafful-kumar commented 3 months ago

So, I tried to run the above command. It gives an empty list.

f-dangel commented 3 months ago

maybe

[mod for mod in net.modules() if len(list(mod.modules())==1 and list(mod.parameters()]

otherwise I'm afraid you need to figure out by yourself how to show these modules.

prafful-kumar commented 3 months ago

I have executed the code you have provided. Can you explain why it is difficult to add my model to backpack extension?

print("module list")

    for mod in model.modules():
        if len(list(mod.modules())) == 1 and list(mod.parameters()):
            print(mod)
            print()

module list Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=2304, bias=True)

Linear(in_features=768, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=False)

Linear(in_features=100, out_features=2304, bias=False)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=3072, bias=True)

Linear(in_features=3072, out_features=768, bias=True)

Linear(in_features=768, out_features=100, bias=True)

Linear(in_features=100, out_features=768, bias=True)

LayerNorm((768,), eps=1e-06, elementwise_affine=True)

Linear(in_features=768, out_features=100, bias=True)

f-dangel commented 3 months ago

Hi, thanks for posting the output. It looks like your network uses nn.Linear and nn.LayerNorm layers.

To use all features of BackPACK, your model needs to entirely consist of supported layers. However, you are only interested in computing first-order extensions, which work in a slightly more general setting. To make BackPACK's first-order extensions work on an arbitrary network, you need to extend only the supported layers, see e.g. this example in the docs.

Your network has linear and layer norm layers. BackPACK does not support nn.LayerNorm at the moment, so you will only be able to compute the GSNR on the linear layers. If you would like to add support for nn.LayerNorm to BackPACK, you can check out this example in the docs which explain how to support new layers.

prafful-kumar commented 3 months ago

Thank you very much. Is there a way to calculate fisher information matrix of the weights using the backpack library?