Open jlclemon opened 3 years ago
Did not realize this was markdown. My mistake on the formatting. Thanks for fixing the formatting.
Thanks for reporting this issue. Any PR or help fixing this is much appreciated!
@jlclemon the 1
in the input_size
is the batch_dim
if i'm not wrong, right? also it would be helpful if you could provide us with the model arch and a gist of what you are trying to do(atleast for a beginner like me). ty
I'm experiencing similar error. It seems that when calculating mult-adds of a torch.nn.Linear
, only the first and last dimension of the input tensor (batch size and feature dimension) are considered.
from torch.nn import Linear
from torchinfo import summary
bs, cin, cout = 5, 3, 8
model = Linear(cin, cout)
in_size = (bs, 10, cin)
print(summary(model, input_size=in_size, col_names=["input_size", "output_size", "num_params", "mult_adds"]))
in_size = (bs, 100, 100, cin)
print(summary(model, input_size=in_size, col_names=["input_size", "output_size", "num_params", "mult_adds"]))
Output:
============================================================================================================================================
Layer (type:depth-idx) Input Shape Output Shape Param # Mult-Adds
============================================================================================================================================
Linear [5, 10, 3] [5, 10, 8] 32 160
============================================================================================================================================
Total params: 32
Trainable params: 32
Non-trainable params: 0
Total mult-adds (M): 0.00
============================================================================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
============================================================================================================================================
============================================================================================================================================
Layer (type:depth-idx) Input Shape Output Shape Param # Mult-Adds
============================================================================================================================================
Linear [5, 100, 100, 3] [5, 100, 100, 8] 32 160
============================================================================================================================================
Total params: 32
Trainable params: 32
Non-trainable params: 0
Total mult-adds (M): 0.00
============================================================================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 3.20
Params size (MB): 0.00
Estimated Total Size (MB): 3.80
============================================================================================================================================
The Mult-Adds
for two input sizes are all 160
$=5\times(3+1)\times8$, the multiple-accumulate operation amount for input size (5, 1, 3)
.
I'm experiencing similar error. It seems that when calculating mult-adds of a , only the first and last dimension of the input tensor (batch size and feature dimension) are considered.
torch.nn.Linear
Environment
- System: Ubuntu 22.0 Docker image with GPU support
Package version:
- pytorch 2.1.1
- torchinfo 1.8.0
Reproduce
from torch.nn import Linear from torchinfo import summary bs, cin, cout = 5, 3, 8 model = Linear(cin, cout) in_size = (bs, 10, cin) print(summary(model, input_size=in_size, col_names=["input_size", "output_size", "num_params", "mult_adds"])) in_size = (bs, 100, 100, cin) print(summary(model, input_size=in_size, col_names=["input_size", "output_size", "num_params", "mult_adds"]))
Output:
============================================================================================================================================ Layer (type:depth-idx) Input Shape Output Shape Param # Mult-Adds ============================================================================================================================================ Linear [5, 10, 3] [5, 10, 8] 32 160 ============================================================================================================================================ Total params: 32 Trainable params: 32 Non-trainable params: 0 Total mult-adds (M): 0.00 ============================================================================================================================================ Input size (MB): 0.00 Forward/backward pass size (MB): 0.00 Params size (MB): 0.00 Estimated Total Size (MB): 0.00 ============================================================================================================================================ ============================================================================================================================================ Layer (type:depth-idx) Input Shape Output Shape Param # Mult-Adds ============================================================================================================================================ Linear [5, 100, 100, 3] [5, 100, 100, 8] 32 160 ============================================================================================================================================ Total params: 32 Trainable params: 32 Non-trainable params: 0 Total mult-adds (M): 0.00 ============================================================================================================================================ Input size (MB): 0.60 Forward/backward pass size (MB): 3.20 Params size (MB): 0.00 Estimated Total Size (MB): 3.80 ============================================================================================================================================
The for two input sizes are all
Mult-Adds``160
=5×(3+1)×8, the multiple-accumulate operation amount for input size .(5, 1, 3)
Yes you are correct. The problem still exists, both are not multiplied by the feature dimension of the input tensor
I submitted a fix in the above pull request.
The idea is to multiply the total number of parameters by the product of all but the last input dimension, like so:
self.macs += int(cur_params * prod(self.output_size[:-1]))
@tyleryep, my only question is when this formula should be applied. I chose to add an elif
clause that checks if Linear
is in the class-name, but perhaps it should just replace the else clause instead?
Describe the bug When the linear layer has a multidimensional input and output (shape with 3 dimensions or more) the computed multiple adds will be incorrect.
To Reproduce
Add line similar to in model with
Make sure the linear layer has multiple dimensions as below.
Steps to reproduce the behavior:
Expected behavior
Notice the number of multiply adds is listed as 16640 but should be 374865920
It appears line 161 of https://github.com/TylerYep/torchinfo/blob/main/torchinfo/layer_info.py fails to take into account that the behavior of linear will require applying the kernel beyond the single output dimension.
Screenshots
Additional context Just noticed this was not the correct number of FLOPS in a model using a linear layer such as this.