In this project, is FLOPs equal to MACs？

Delicious-Bitter-Melon commented 4 months ago

Thanks for your excellent work. In this project, is FLOPs equal to MACs？

ThanatosShinji commented 4 months ago

No, 1 MAC= 2 FLOPs (mul and add)

MarvinChao commented 3 months ago

@ThanatosShinji first thanks so much for your project, I find it very very neat and useful. I have been studied your code for profiling. While I understand your original implementation is assuming x86, it is not exactly accurate given your explanation above for the difference between ALU operation and MAC unit. ALU and MAC are almost always separate units on all the processors (that I know). I feel the challenge here is how to fairly describe the network compute complexity. While the current MAC accounting can give a reasonable ballpark for CNN, if you use the same method to estimate newer topology like transformer-based networks it will be way off. Also training and inferencing often have quite different compute complexity for operations like activation functions. I am working to separate the current accounting into two buckets: MAC & ALU ops. I am trying to add an implementation for inferencing first as it is earlier and cleaner. If you are interested to take a look at my work I'll share my commit with you later and hopefully I can help to improve this area in your project

ThanatosShinji commented 3 months ago

@MarvinChao Thanks for your interest! The initial idea of the MAC design is to tell people the complexity change between sigmoid and hardsigmoid. Of course, ALU and FPU usually have their own backend port. As you mentioned, it's difficult to describe all OPs on all platforms. Transformer-based networks are memory-intensive models, if the input sequence is less than some value. In this case, the parameter size can tell the inference complexity.

ThanatosShinji / onnx-tool

In this project, is FLOPs equal to MACs？ #80