About model inference acceleration

Hi! Very impressive project!

My main goal is to export the model to intermediate format and test accelerability on many platforms. I am trying to accelerate the assembled convolution module for better model inference.

The current model architecture contains multiple reshapes and scatters for each inference (in the "assembly" block), making it not suitable for many inference platforms outside of graphics cards due to the amount of data manipulation. I have dealt with other bottlenecks of design, but I was thinking is there more steps than what are reported in the paper? e.g. a way for folding the coefficients to the convolutions for the inference/prediction/test phase?

Any test utility would be highly appreciated also, not quite certain am I missing something, because I use NVIDIA RTX2060 Super and have not been able to get matching speed-ups even though I have been able to delegate the whole training to GPU with matching environment to what you are using.

With kindest regards, Tuomas

doantienthongbku / AsConvSR-TorchLighting

About model inference acceleration #1