Open Delicious-Bitter-Melon opened 4 months ago
No, 1 MAC= 2 FLOPs (mul and add)
@ThanatosShinji first thanks so much for your project, I find it very very neat and useful. I have been studied your code for profiling. While I understand your original implementation is assuming x86, it is not exactly accurate given your explanation above for the difference between ALU operation and MAC unit. ALU and MAC are almost always separate units on all the processors (that I know). I feel the challenge here is how to fairly describe the network compute complexity. While the current MAC accounting can give a reasonable ballpark for CNN, if you use the same method to estimate newer topology like transformer-based networks it will be way off. Also training and inferencing often have quite different compute complexity for operations like activation functions. I am working to separate the current accounting into two buckets: MAC & ALU ops. I am trying to add an implementation for inferencing first as it is earlier and cleaner. If you are interested to take a look at my work I'll share my commit with you later and hopefully I can help to improve this area in your project
@MarvinChao Thanks for your interest! The initial idea of the MAC design is to tell people the complexity change between sigmoid and hardsigmoid. Of course, ALU and FPU usually have their own backend port. As you mentioned, it's difficult to describe all OPs on all platforms. Transformer-based networks are memory-intensive models, if the input sequence is less than some value. In this case, the parameter size can tell the inference complexity.
Thanks for your excellent work. In this project, is FLOPs equal to MACs?