Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
Problem
I compared the performance of convolution on different resolutions on different devices (inf1.xlarge, CPU and GPU). The inf1 performance is close to the GPU (Quadro RTX 8000) when the resolution is small, but it gets worse when the resolution is large and slightly worse than the cpu performance in some cases.
op
input_shape
output_shape
Latency Avg (ms)(cpu)
Latency Avg (ms)(gpu)
Latency Avg (ms)(inf1.xlarge)
torch.nn.Conv2d
(1,256,128,14)
(1,256,128,14)
0.58
0.13
0.36
torch.nn.Conv2d
(1,512,512,7)
(1,512,512,7)
1.32
0.24
0.28
torch.nn.Conv2d
(1,256,128,128)
(1,256,128,128)
29.61
0.69
29.14
torch.nn.Conv2d
(1,3,1024,1024)
(1,3,1024,1024)
25.85
0.36
28.91
Our model is based on stylegan2 and some intermediate outputs of the model have large resolutions (128,256,512).
Python Code:
test_conv.zip
Environment device : inf1.xlarge torch 1.3.1 torch-neuron 1.13.1.2.9.17.0
Problem I compared the performance of convolution on different resolutions on different devices (inf1.xlarge, CPU and GPU). The inf1 performance is close to the GPU (Quadro RTX 8000) when the resolution is small, but it gets worse when the resolution is large and slightly worse than the cpu performance in some cases.
Our model is based on stylegan2 and some intermediate outputs of the model have large resolutions (128,256,512). Python Code: test_conv.zip