Closed dawson-chen closed 4 years ago
First, thanks for you work, it's very useful for inference BERT-like modes. Hope your paper get published soon. And something i'm confused about is the FLOP of dense layer, which is in the section 4.1. As far as i know, the FLOPs of fully connect layer with bias = 2 I O
I=input neuron numbers, O=output neuron numbers.
For the Fully-connect layer 128 ->128, FLOPs = 2 128 128 = 32,768 And in the Table 1, the answer is 4.2M, which is much higher than i got. Can you release your method to calculate the answer ?
In BERT, if the input sentence contains 128 words, the FLOPs should be 2 128 128 * 128 = 4194304.
Now i got it, thank you.
First, thanks for you work, it's very useful for inference BERT-like modes. Hope your paper get published soon. And something i'm confused about is the FLOP of dense layer, which is in the section 4.1. As far as i know, the FLOPs of fully connect layer with bias = 2 I O
For the Fully-connect layer 128 ->128, FLOPs = 2 128 128 = 32,768 And in the Table 1, the answer is 4.2M, which is much higher than i got. Can you release your method to calculate the answer ?