doantienthongbku / AsConvSR-TorchLighting

31 stars 1 forks source link

About model inference acceleration #1

Open OuterSpaceTraveller opened 1 year ago

OuterSpaceTraveller commented 1 year ago

Hi! Very impressive project!

My main goal is to export the model to intermediate format and test accelerability on many platforms. I am trying to accelerate the assembled convolution module for better model inference.

The current model architecture contains multiple reshapes and scatters for each inference (in the "assembly" block), making it not suitable for many inference platforms outside of graphics cards due to the amount of data manipulation. I have dealt with other bottlenecks of design, but I was thinking is there more steps than what are reported in the paper? e.g. a way for folding the coefficients to the convolutions for the inference/prediction/test phase?

Any test utility would be highly appreciated also, not quite certain am I missing something, because I use NVIDIA RTX2060 Super and have not been able to get matching speed-ups even though I have been able to delegate the whole training to GPU with matching environment to what you are using.

With kindest regards, Tuomas

zhuzhu18 commented 4 months ago

I am also thinking about this issue. After converting the model to ONNX, I used Netron to view the computational graph and found that it is very large. I guess it is because the model uses a for loop to traverse each output channel during forward, adding many operators. Can I turn it into a matrix operation to simplify the computational graph and accelerate training and inference?