aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
462 stars 154 forks source link

[inf1][torch-neuron][performance] convolution performance gets worse when the resolution gets large compared to GPU. #826

Open PigletOS opened 9 months ago

PigletOS commented 9 months ago

Environment device : inf1.xlarge torch 1.3.1 torch-neuron 1.13.1.2.9.17.0

Problem I compared the performance of convolution on different resolutions on different devices (inf1.xlarge, CPU and GPU). The inf1 performance is close to the GPU (Quadro RTX 8000) when the resolution is small, but it gets worse when the resolution is large and slightly worse than the cpu performance in some cases.

op input_shape output_shape Latency Avg (ms)(cpu) Latency Avg (ms)(gpu) Latency Avg (ms)(inf1.xlarge)
torch.nn.Conv2d (1,256,128,14) (1,256,128,14) 0.58 0.13 0.36
torch.nn.Conv2d (1,512,512,7) (1,512,512,7) 1.32 0.24 0.28
torch.nn.Conv2d (1,256,128,128) (1,256,128,128) 29.61 0.69 29.14
torch.nn.Conv2d (1,3,1024,1024) (1,3,1024,1024) 25.85 0.36 28.91

Our model is based on stylegan2 and some intermediate outputs of the model have large resolutions (128,256,512). Python Code: test_conv.zip

jluntamazon commented 9 months ago

@PigletOS Thanks for the sample script!

We have reproduced the issue and will check to see if there is anything we can do to improve performance.