I directly use the 4-th step to make some inference with a single image.
And I find if the input image shape is [64,64,3], the time cost is ~1s.
Then increase the crop shape to [256, 256, 3], the time cost is ~11s.
Further increasing img shape may cause unaffordable time cost.
It's obviously not the linear increasing time cost, So it's there something to speed up?
Though it's energy cost is small, but time cost is very large than normal cnn model equiped with CUDA.
The claim on linearly growing cost is storage requirement, not time cost. Time cost is linear to the pixel numbers.
The reported inference speed in the paper is tested on mobile phones with a multiple-thread API(JAVA intstream()). It can be accelerated with custom CUDA operators. Step 4 is only a prototype implementation with numpy.
I directly use the 4-th step to make some inference with a single image. And I find if the input image shape is [64,64,3], the time cost is ~1s. Then increase the crop shape to [256, 256, 3], the time cost is ~11s. Further increasing img shape may cause unaffordable time cost.
It's obviously not the linear increasing time cost, So it's there something to speed up? Though it's energy cost is small, but time cost is very large than normal cnn model equiped with CUDA.