aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
464 stars 154 forks source link

Allocation API Build Degrades Matrix Multiplication Performance #1005

Open nandeeka opened 1 month ago

nandeeka commented 1 month ago

I installed the latest version of Neuron using the .deb and .whl files @aws-serina-tan sent me. This version degrades the performance of my matrix multplication (currently being reviewed here).

Using Neuron 2.20 (the "Deep Learning AMI Neuron (Ubuntu 22.04) 20240927" AMI launched today), my latency distribution looks like:

Latency results are:
 NCLatency: 
p0 = 6929us
p1 = 6929us
p10 = 6929us
p25 = 6930us
p50 = 6931us
p90 = 6932us
p99 = 6932us
p100 = 6932us

With these newly installed files, my latency distribution now looks like:

Latency results are:
 NCLatency: 
p0 = 6947us
p1 = 6947us
p10 = 6947us
p25 = 6948us
p50 = 6949us
p90 = 6952us
p99 = 6952us
p100 = 6953us

I tried this with and without the --disable-dge flag. This flag had no effect.

JonathanHenson commented 1 month ago

Thanks for letting us know! I’m looking again at your pull request. I think I see some allocations that need to be hoisted outside the loops. I’ll provide comments there.

JonathanHenson commented 1 month ago

Nevermind, I was incorrect. We will reproduce these results internally and track it down. Thanks again!

JonathanHenson commented 1 month ago

After loooking into this we think it's more likely to be a change not related to nki or the allocation apis but rather a change for that path in the compiler or runtime.