Closed prafful-kumar closed 3 months ago
Hi, thanks for your question. It looks like you have called backpack.extend()
on layers that contain parameters but are not supported in BackPACK. The warnings suggest these are container modules and therefore don't need to be extend
ed to get the quantities required for GSNR to work.
Could you post an overview of the layers your networks contains?
Hi, thank you for your quick response. I am attaching the model structure and trainable layers.
model summary: DistributedDataParallel( (module): VisionTransformer( (patch_embed): PatchEmbed( (proj): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16)) (norm): Identity() ) (pos_drop): Dropout(p=0.0, inplace=False) (blocks): Sequential( (0): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): Identity() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (1): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (2): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (3): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (4): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (5): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (6): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (7): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (8): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (9): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (10): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) (11): Block( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (attn): Attention( (qkv): Linear(in_features=768, out_features=2304, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=768, out_features=768, bias=True) (proj_drop): Dropout(p=0.0, inplace=False) (LoRA_a): Linear(in_features=768, out_features=100, bias=False) (LoRA_b): Linear(in_features=100, out_features=2304, bias=False) (LoRA_drop): Dropout(p=0.1, inplace=False) (prefix_drop): Dropout(p=0.1, inplace=False) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (mlp): Mlp( (fc1): Linear(in_features=768, out_features=3072, bias=True) (act): GELU(approximate='none') (drop1): Dropout(p=0.0, inplace=False) (fc2): Linear(in_features=3072, out_features=768, bias=True) (drop2): Dropout(p=0.0, inplace=False) ) (drop_prompt): Dropout(p=0.1, inplace=False) (adapter): AdapterSuper( (dropout): Dropout(p=0.1, inplace=False) (ln1): Linear(in_features=768, out_features=100, bias=True) (activate): QuickGELU() (ln2): Linear(in_features=100, out_features=768, bias=True) ) ) ) (norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (pre_logits): Identity() (head): Linear(in_features=768, out_features=100, bias=True) (drop_prompt): Dropout(p=0.1, inplace=False) ) )
trainable parameters: module.visual_prompt_token module.visual_prompt_token_pos_embed module.blocks.0.visual_prompt_token module.blocks.0.attn.prefix_tokens_key module.blocks.0.attn.prefix_tokens_value module.blocks.0.attn.LoRA_a.weight module.blocks.0.attn.LoRA_b.weight module.blocks.0.adapter.sampled_bias_1 module.blocks.0.adapter.ln1.weight module.blocks.0.adapter.ln1.bias module.blocks.0.adapter.ln2.weight module.blocks.1.visual_prompt_token module.blocks.1.attn.prefix_tokens_key module.blocks.1.attn.prefix_tokens_value module.blocks.1.attn.LoRA_a.weight module.blocks.1.attn.LoRA_b.weight module.blocks.1.adapter.sampled_bias_1 module.blocks.1.adapter.ln1.weight module.blocks.1.adapter.ln1.bias module.blocks.1.adapter.ln2.weight module.blocks.2.visual_prompt_token module.blocks.2.attn.prefix_tokens_key module.blocks.2.attn.prefix_tokens_value module.blocks.2.attn.LoRA_a.weight module.blocks.2.attn.LoRA_b.weight module.blocks.2.adapter.sampled_bias_1 module.blocks.2.adapter.ln1.weight module.blocks.2.adapter.ln1.bias module.blocks.2.adapter.ln2.weight module.blocks.3.visual_prompt_token module.blocks.3.attn.prefix_tokens_key module.blocks.3.attn.prefix_tokens_value module.blocks.3.attn.LoRA_a.weight module.blocks.3.attn.LoRA_b.weight module.blocks.3.adapter.sampled_bias_1 module.blocks.3.adapter.ln1.weight module.blocks.3.adapter.ln1.bias module.blocks.3.adapter.ln2.weight module.blocks.4.visual_prompt_token module.blocks.4.attn.prefix_tokens_key module.blocks.4.attn.prefix_tokens_value module.blocks.4.attn.LoRA_a.weight module.blocks.4.attn.LoRA_b.weight module.blocks.4.adapter.sampled_bias_1 module.blocks.4.adapter.ln1.weight module.blocks.4.adapter.ln1.bias module.blocks.4.adapter.ln2.weight module.blocks.5.visual_prompt_token module.blocks.5.attn.prefix_tokens_key module.blocks.5.attn.prefix_tokens_value module.blocks.5.attn.LoRA_a.weight module.blocks.5.attn.LoRA_b.weight module.blocks.5.adapter.sampled_bias_1 module.blocks.5.adapter.ln1.weight module.blocks.5.adapter.ln1.bias module.blocks.5.adapter.ln2.weight module.blocks.6.visual_prompt_token module.blocks.6.attn.prefix_tokens_key module.blocks.6.attn.prefix_tokens_value module.blocks.6.attn.LoRA_a.weight module.blocks.6.attn.LoRA_b.weight module.blocks.6.adapter.sampled_bias_1 module.blocks.6.adapter.ln1.weight module.blocks.6.adapter.ln1.bias module.blocks.6.adapter.ln2.weight module.blocks.7.visual_prompt_token module.blocks.7.attn.prefix_tokens_key module.blocks.7.attn.prefix_tokens_value module.blocks.7.attn.LoRA_a.weight module.blocks.7.attn.LoRA_b.weight module.blocks.7.adapter.sampled_bias_1 module.blocks.7.adapter.ln1.weight module.blocks.7.adapter.ln1.bias module.blocks.7.adapter.ln2.weight module.blocks.8.visual_prompt_token module.blocks.8.attn.prefix_tokens_key module.blocks.8.attn.prefix_tokens_value module.blocks.8.attn.LoRA_a.weight module.blocks.8.attn.LoRA_b.weight module.blocks.8.adapter.sampled_bias_1 module.blocks.8.adapter.ln1.weight module.blocks.8.adapter.ln1.bias module.blocks.8.adapter.ln2.weight module.blocks.9.visual_prompt_token module.blocks.9.attn.prefix_tokens_key module.blocks.9.attn.prefix_tokens_value module.blocks.9.attn.LoRA_a.weight module.blocks.9.attn.LoRA_b.weight module.blocks.9.adapter.sampled_bias_1 module.blocks.9.adapter.ln1.weight module.blocks.9.adapter.ln1.bias module.blocks.9.adapter.ln2.weight module.blocks.10.visual_prompt_token module.blocks.10.attn.prefix_tokens_key module.blocks.10.attn.prefix_tokens_value module.blocks.10.attn.LoRA_a.weight module.blocks.10.attn.LoRA_b.weight module.blocks.10.adapter.sampled_bias_1 module.blocks.10.adapter.ln1.weight module.blocks.10.adapter.ln1.bias module.blocks.10.adapter.ln2.weight module.blocks.11.visual_prompt_token module.blocks.11.attn.prefix_tokens_key module.blocks.11.attn.prefix_tokens_value module.blocks.11.attn.LoRA_a.weight module.blocks.11.attn.LoRA_b.weight module.blocks.11.adapter.sampled_bias_1 module.blocks.11.adapter.ln1.weight module.blocks.11.adapter.ln1.bias module.blocks.11.adapter.ln2.weight module.head.weight module.head.bias
Could you only show the modules that have parameters and no sub-modules? Something like
[mod for mod in net.children() if not list(mod.children()) and list(mod.parameters()]
So, I tried to run the above command. It gives an empty list.
maybe
[mod for mod in net.modules() if len(list(mod.modules())==1 and list(mod.parameters()]
otherwise I'm afraid you need to figure out by yourself how to show these modules.
I have executed the code you have provided. Can you explain why it is difficult to add my model to backpack extension?
print("module list")
for mod in model.modules():
if len(list(mod.modules())) == 1 and list(mod.parameters()):
print(mod)
print()
module list Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=2304, bias=True)
Linear(in_features=768, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=False)
Linear(in_features=100, out_features=2304, bias=False)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=3072, bias=True)
Linear(in_features=3072, out_features=768, bias=True)
Linear(in_features=768, out_features=100, bias=True)
Linear(in_features=100, out_features=768, bias=True)
LayerNorm((768,), eps=1e-06, elementwise_affine=True)
Linear(in_features=768, out_features=100, bias=True)
Hi,
thanks for posting the output. It looks like your network uses nn.Linear
and nn.LayerNorm
layers.
To use all features of BackPACK, your model needs to entirely consist of supported layers. However, you are only interested in computing first-order extensions, which work in a slightly more general setting. To make BackPACK's first-order extensions work on an arbitrary network, you need to extend
only the supported layers, see e.g. this example in the docs.
Your network has linear and layer norm layers. BackPACK does not support nn.LayerNorm
at the moment, so you will only be able to compute the GSNR on the linear layers. If you would like to add support for nn.LayerNorm
to BackPACK, you can check out this example in the docs which explain how to support new layers.
Thank you very much. Is there a way to calculate fisher information matrix of the weights using the backpack library?
When I am trying to find the GSNR value using the backpack library with models built on timm, I am facing these errors. How can I solve these?
GOAL: I aim to calculate GSNR values for adapter parameters (used in the foundation model) over epochs.
/scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to variance does not have an extension for Module <class 'timm.models.layers.mlp.Mlp'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to sum_grad_squared does not have an extension for Module <class 'timm.models.layers.mlp.Mlp'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to grad_batch does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Attention'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to variance does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Attention'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to sum_grad_squared does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Attention'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to grad_batch does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Block'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to variance does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Block'> although the module has parameters warnings.warn( /scratch/itee/uqpkuma6/miniconda3/envs/NOAH/lib/python3.8/site-packages/backpack/extensions/backprop_extension.py:106: UserWarning: Extension saving to sum_grad_squared does not have an extension for Module <class 'model.supernet_vision_transformer_timm.Block'> although the module has parameters