apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.76k stars 6.8k forks source link

[Feature request]Calculate network calculations tools for Gluon. #14955

Open PistonY opened 5 years ago

PistonY commented 5 years ago

Now gluon has a summary function for calculate total params for a network but don't have a tool for calculate network FLOPs(G).

Why need this?

But it's hard for calculating all of ops and actually not necessary(Some discussions here). So I partition them on demand.

  1. Widely used in many networks and most commonly used.[urgent]
    • Conv2d/3d
    • Maxpool2d/3d
    • Avgpool 2d/3d
    • GlobalAvgPool2d/3d
    • FC
    • Relu, LeakyReLU, PReLU, Tanh, Sigmoid
    • BN
    • Softmax
    • RNN(basic rnn is just matrix multiplication, more complicated I don't know, this may need add if anything misses.)
  2. Used somewhere but may not common.[not urgent]
    • Dropout
    • Conv1d, Maxpool1d, Avgpool1d
    • GlobalAvgPool1d
    • ConvTranspose1d/2d/3d (GAN may use this, but for now GAN don't care about FLOPs)
    • UpSampling bilinear/nearest
    • InstanceNorm, LayerNorm
    • L2Normalization
      1. May only use once or little times in a model, their FLOPs may based on implement. Not hard to calculate manually.(May not to implement.)
        • ROIPooling
        • Atomic level operation(I didn't see anyone calculate them.)
  3. Need not to implement
    • Loss functions.

Additional interface should be added to make users defined calculations themselves for manually defined Blocks.

Welcome to add if anything missing. Welcome to suggest if anything wrong.

mxnet-label-bot commented 5 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: Gluon, Feature

pengzhao-intel commented 5 years ago

The Gflops is related to what kind of implementation is used and whether other optimization skills are applied.

For example, the computation of direct, GEMM-based and Winograd convolution is really different. And if the convolution is fused with BN/Relu/Sum, the calculation is also changed.

If you only care about how much OPs (Add, Mul, FMA) in the network at the runtime, I suggest to get it by profiling tools in a short time :)

Anyway, this is a good proposal for the analysis and I also like it.

pengzhao-intel commented 5 years ago

@sandeep-krishnamurthy 's proposal maybe can cover your request with further improvements. https://cwiki.apache.org/confluence/display/MXNET/MXNet+Operator+Benchmarks

PistonY commented 5 years ago

@pengzhao-intel I mean GFLOPs is not consider special further improvements.It's just standard calculated amount. For example, a Conv2d flops should be defined as:

Assume that the input and output and kernel are square.

O(conv2d-nobias) = K * K * Cin * M * M * Cout

M = output feature map side length
K = kernel size side length
C(in/out) = Channels in/out
sandeep-krishnamurthy commented 5 years ago

Thanks this will be very useful.

Currently, I am working on per operator profiling to capture forward time, backward time, max memory allocated in phase 1 of my work. And, these uses different input shapes and option for each operator.

Intention of this work is - to catch any performance regression at operator level, ability identify hot computation paths in operator kernel for certain input shapes, and finally, derive insights like - In MXNet ArgMax is slower than Max operator for same operation.

Will that give a proxy for what you are looking for? However, this proposal is more fine grained and will be useful for planning optimization work.

PistonY commented 5 years ago

@sandeep-krishnamurthy It's pleasure this may help you, but I just want a static analysis as I reply to pengzhao-intel, that's not complicated as you do. The static analysis could help you evaluate network computing in the most fair way.

djaym7 commented 4 years ago

Anyone has an update on this ?