Print `Trainable` as a column

joonas-yoon commented 2 years ago

🚀 Feature

New column in summary, Trainable determines whether gradients need to be computed.

We can know this from model's parameters easily:

for p in model.parameters():
    print(p.requires_grad)

In short, expected feature is:

_________________________________________________________________________________________________________
Layer                        Type                  Output Shape              Param #          Trainable
=========================================================================================================
vgg                          VGG                   (-1, 1000)                0                
├─features                   Sequential            (-1, 512, 7, 7)           0                
|    └─0                     Conv2d                (-1, 64, 224, 224)        1,792            True
|    └─1                     ReLU                  (-1, 64, 224, 224)        0                -
|    └─2                     Conv2d                (-1, 64, 224, 224)        36,928           True
|    └─3                     ReLU                  (-1, 64, 224, 224)        0                -
|    └─4                     MaxPool2d             (-1, 64, 112, 112)        0                
|    └─5                     Conv2d                (-1, 128, 112, 112)       73,856           True
|    └─6                     ReLU                  (-1, 128, 112, 112)       0                -
...
├─classifier                 Sequential            (-1, 1000)                0                
|    └─0                     Linear                (-1, 4096)                102,764,544      False
|    └─1                     ReLU                  (-1, 4096)                0                -
|    └─2                     Dropout               (-1, 4096)                0                -
|    └─3                     Linear                (-1, 4096)                16,781,312       False
|    └─4                     ReLU                  (-1, 4096)                0                -
|    └─5                     Dropout               (-1, 4096)                0                -
|    └─6                     Linear                (-1, 1000)                4,097,000        False

Motivation & pitch

I have been trying transfering model with DenseNet, and got summary.

model = torchvision.models.densenet201(pretrained=True)
model.classifier = nn.Sequential(
    nn.Linear(1920, 10)
)
for p in model.classifier.parameters():
    p.requires_grad = False
summary(model, (3, 224, 224))

but there is no information which layer is trainable. this is the tail of result.

|    |    |    └─conv2       Conv2d                (-1, 32, 7, 7)            36,864         
|    └─norm5                 BatchNorm2d           (-1, 1920, 7, 7)          7,681          
├─classifier                 Sequential            (-1, 10)                  0              
|    └─0                     Linear                (-1, 10)                  19,210         
==========================================================================================
Trainable params: 18,092,928
Non-trainable params: 19,210
Total params: 18,112,138

Alternatives

No response

Additional context

I will wait for your response. I want to hear what you think about this.

frgfm commented 2 years ago

Hi @joonas-yoon 👋

This is an interesting feature idea! Here is some feedback:

there can be mutliple params in one layer
for torchscan to check backprop RAM consumption, we need the parameters to require the grad

What do you think?

joonas-yoon commented 2 years ago

Could you give me one example for multiple parameters? I have no idea about it but interesting.

joonas-yoon commented 2 years ago

and for second thing, RAM consumption, how about save all of its state and restore them?

obviously, it have to take more time and less performance. any idea?

frgfm commented 2 years ago

Hey there 👋

Well, for multiple parameters, almost all layers 😅

from torch import nn

# Create a fully connected layer
layer = nn.Linear(4, 8)
# Don't track grad on the weights
layer.weight.requires_grad_(False)

# But the bias is still loose
for n, p in layer.named_parameters():
    print(n, p.requires_grad)

which yields:

weight False
bias True

For the second part, I had the same in mind, I agree 👍

joonas-yoon commented 2 years ago

Oh I see. then, only for having different one, how about this?

_______________________________________________________________________________________________________________
Layer                        Type                  Output Shape              Param #          Trainable
===============================================================================================================
vgg                          VGG                   (-1, 1000)                0                
...
|    └─3                     Linear                (-1, 4096)                16,781,312       False
|    └─4                     ReLU                  (-1, 4096)                0                -
|    └─5                     Dropout               (-1, 4096)                0                -
|    └─6                     Linear                (-1, 1000)                4,097,000        weight: False
|                                                                                             bias: True

no matter it has multiple lines. it's okay with single line as like weight: False, bias: True. but it prints too long string 🤔

frgfm commented 2 years ago

Well, that will become hairy, I honestly don't want to spread on multiple lines. The only suggestion I can see is:

write True if any parameter is trainable
write False otherwise

joonas-yoon commented 2 years ago

good, I totally agree with you.

one thing I want to suggest is, it needs to be noticed from documentation. for example, "False; contains partial mixed-trainable parameters"

frgfm / torch-scan