Error in computing Linear Layer Multiply adds

jlclemon commented 3 years ago

Describe the bug When the linear layer has a multidimensional input and output (shape with 3 dimensions or more) the computed multiple adds will be incorrect.

To Reproduce

Add line similar to in model with

model_stats = summary(model, input_size, img_metas=img_metas, gt_semantic_seg=seg, depth=9, col_names=["input_size", "kernel_size", "output_size", "num_params", "mult_adds"])

Make sure the linear layer has multiple dimensions as below.

Layer (type:depth-idx)            Input Shape               Kernel Shape              Output Shape              Param #         Mult-Adds
           Linear: 5-8             [1, 22528, 64]            [64, 256]                 [1, 22528, 256]           16,640                    16,640

Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

Notice the number of multiply adds is listed as 16640 but should be 374865920

It appears line 161 of https://github.com/TylerYep/torchinfo/blob/main/torchinfo/layer_info.py fails to take into account that the behavior of linear will require applying the kernel beyond the single output dimension.

Screenshots

Layer (type:depth-idx)            Input Shape               Kernel Shape              Output Shape              Param #         Mult-Adds
           Linear: 5-8             [1, 22528, 64]            [64, 256]                 [1, 22528, 256]           16,640                    16,640

Additional context Just noticed this was not the correct number of FLOPS in a model using a linear layer such as this.

jlclemon commented 3 years ago

Did not realize this was markdown. My mistake on the formatting. Thanks for fixing the formatting.

TylerYep commented 3 years ago

Thanks for reporting this issue. Any PR or help fixing this is much appreciated!

notjedi commented 2 years ago

@jlclemon the 1 in the input_size is the batch_dim if i'm not wrong, right? also it would be helpful if you could provide us with the model arch and a gist of what you are trying to do(atleast for a beginner like me). ty

MewmewWho commented 8 months ago

I'm experiencing similar error. It seems that when calculating mult-adds of a torch.nn.Linear, only the first and last dimension of the input tensor (batch size and feature dimension) are considered.

Environment

System: Ubuntu 22.0 Docker image with GPU support
Package version:
- pytorch 2.1.1
- torchinfo 1.8.0

Reproduce

from torch.nn import Linear
from torchinfo import summary

bs, cin, cout = 5, 3, 8
model = Linear(cin, cout)

in_size = (bs, 10, cin)
print(summary(model, input_size=in_size, col_names=["input_size", "output_size", "num_params", "mult_adds"]))

in_size = (bs, 100, 100, cin)
print(summary(model, input_size=in_size, col_names=["input_size", "output_size", "num_params", "mult_adds"]))

Output:

============================================================================================================================================
Layer (type:depth-idx)                   Input Shape               Output Shape              Param #                   Mult-Adds
============================================================================================================================================
Linear                                   [5, 10, 3]                [5, 10, 8]                32                        160
============================================================================================================================================
Total params: 32
Trainable params: 32
Non-trainable params: 0
Total mult-adds (M): 0.00
============================================================================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
============================================================================================================================================
============================================================================================================================================
Layer (type:depth-idx)                   Input Shape               Output Shape              Param #                   Mult-Adds
============================================================================================================================================
Linear                                   [5, 100, 100, 3]          [5, 100, 100, 8]          32                        160
============================================================================================================================================
Total params: 32
Trainable params: 32
Non-trainable params: 0
Total mult-adds (M): 0.00
============================================================================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 3.20
Params size (MB): 0.00
Estimated Total Size (MB): 3.80
============================================================================================================================================

The Mult-Adds for two input sizes are all 160$=5\times(3+1)\times8$, the multiple-accumulate operation amount for input size (5, 1, 3).

Jayce0625 commented 5 months ago

I'm experiencing similar error. It seems that when calculating mult-adds of a , only the first and last dimension of the input tensor (batch size and feature dimension) are considered.torch.nn.Linear

Environment

System: Ubuntu 22.0 Docker image with GPU support
Package version:
- pytorch 2.1.1
- torchinfo 1.8.0

Reproduce

from torch.nn import Linear
from torchinfo import summary

bs, cin, cout = 5, 3, 8
model = Linear(cin, cout)

in_size = (bs, 10, cin)
print(summary(model, input_size=in_size, col_names=["input_size", "output_size", "num_params", "mult_adds"]))

in_size = (bs, 100, 100, cin)
print(summary(model, input_size=in_size, col_names=["input_size", "output_size", "num_params", "mult_adds"]))

Output:

============================================================================================================================================
Layer (type:depth-idx)                   Input Shape               Output Shape              Param #                   Mult-Adds
============================================================================================================================================
Linear                                   [5, 10, 3]                [5, 10, 8]                32                        160
============================================================================================================================================
Total params: 32
Trainable params: 32
Non-trainable params: 0
Total mult-adds (M): 0.00
============================================================================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
============================================================================================================================================
============================================================================================================================================
Layer (type:depth-idx)                   Input Shape               Output Shape              Param #                   Mult-Adds
============================================================================================================================================
Linear                                   [5, 100, 100, 3]          [5, 100, 100, 8]          32                        160
============================================================================================================================================
Total params: 32
Trainable params: 32
Non-trainable params: 0
Total mult-adds (M): 0.00
============================================================================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 3.20
Params size (MB): 0.00
Estimated Total Size (MB): 3.80
============================================================================================================================================

The for two input sizes are all Mult-Adds``160=5×(3+1)×8, the multiple-accumulate operation amount for input size .(5, 1, 3)

Yes you are correct. The problem still exists, both are not multiplied by the feature dimension of the input tensor

andravin commented 1 month ago

I submitted a fix in the above pull request.

The idea is to multiply the total number of parameters by the product of all but the last input dimension, like so:

self.macs += int(cur_params * prod(self.output_size[:-1]))

@tyleryep, my only question is when this formula should be applied. I chose to add an elif clause that checks if Linear is in the class-name, but perhaps it should just replace the else clause instead?

TylerYep / torchinfo