Closed wullia closed 3 months ago
Hi @wullia. The FLOPs for $K^\top V$ and $Q(K^\top V)$ is calculated by fvcore , but not listed. For example, in your log, the FLOPs of a linear attention block is larger than the sum of FLOPs of all the components listed.
(attn): LinearAttention(
dim=64, num_heads=2
#params: 8.96K, #flops: 40.54M
(qk): Linear(
in_features=64, out_features=128, bias=True
#params: 8.32K, #flops: 25.69M
)
(elu): ELU(alpha=1.0)
(lepe): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64
#params: 0.64K, #flops: 1.81M
)
(rope): RoPE()
)
I believe the FLOPs reported in our paper are accurate.
Got it, thx for your reply!
I have been evaluating the MLLA-Tiny model and noticed a potential discrepancy in the FLOPs computation as reported in the MLLA paper. Using fvcore , the computed FLOPs for MLLA-Tiny are approximately 4.16G, aligning with the figures reported in the paper. However, upon a detailed examination, it appears that the FLOPs for the linear attention components are not fully accounted for. Specifically, the operations transpose(K)V and Q transpose(K)V seem to be omitted from the FLOPs calculation. Below are the logs from the FLOPs calculation, focusing on the stem and the first stage parts, where these discrepancies appear:
Could you please help address this concern? If my observations are correct, I suggest updating the FLOPs calculations in your paper to ensure a fair comparison with other models.
Thank you for looking into this matter.