[inf1][torch-neuron][performance] convolution performance gets worse when the resolution gets large compared to GPU.

Environment device : inf1.xlarge torch 1.3.1 torch-neuron 1.13.1.2.9.17.0

Problem I compared the performance of convolution on different resolutions on different devices (inf1.xlarge, CPU and GPU). The inf1 performance is close to the GPU (Quadro RTX 8000) when the resolution is small, but it gets worse when the resolution is large and slightly worse than the cpu performance in some cases.

op	input_shape	output_shape	Latency Avg (ms)(cpu)	Latency Avg (ms)(gpu)	Latency Avg (ms)(inf1.xlarge)
torch.nn.Conv2d	(1,256,128,14)	(1,256,128,14)	0.58	0.13	0.36
torch.nn.Conv2d	(1,512,512,7)	(1,512,512,7)	1.32	0.24	0.28
torch.nn.Conv2d	(1,256,128,128)	(1,256,128,128)	29.61	0.69	29.14
torch.nn.Conv2d	(1,3,1024,1024)	(1,3,1024,1024)	25.85	0.36	28.91

Our model is based on stylegan2 and some intermediate outputs of the model have large resolutions (128,256,512). Python Code: test_conv.zip

aws-neuron / aws-neuron-sdk

[inf1][torch-neuron][performance] convolution performance gets worse when the resolution gets large compared to GPU. #826