aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
447 stars 149 forks source link

Allocation API Build Degrades Matrix Multiplication Performance #1005

Open nandeeka opened 2 hours ago

nandeeka commented 2 hours ago

I installed the latest version of Neuron using the .deb and .whl files @aws-serina-tan sent me. This version degrades the performance of my matrix multplication (currently being reviewed here).

Using Neuron 2.20 (the "Deep Learning AMI Neuron (Ubuntu 22.04) 20240927" AMI launched today), my latency distribution looks like:

Latency results are:
 NCLatency: 
p0 = 6929us
p1 = 6929us
p10 = 6929us
p25 = 6930us
p50 = 6931us
p90 = 6932us
p99 = 6932us
p100 = 6932us

With these newly installed files, my latency distribution now looks like:

Latency results are:
 NCLatency: 
p0 = 6947us
p1 = 6947us
p10 = 6947us
p25 = 6948us
p50 = 6949us
p90 = 6952us
p99 = 6952us
p100 = 6953us

I tried this with and without the --disable-dge flag. This flag had no effect.

JonathanHenson commented 2 hours ago

Thanks for letting us know! I’m looking again at your pull request. I think I see some allocations that need to be hoisted outside the loops. I’ll provide comments there.

JonathanHenson commented 2 hours ago

Nevermind, I was incorrect. We will reproduce these results internally and track it down. Thanks again!