Open nandeeka opened 1 month ago
Thanks for letting us know! I’m looking again at your pull request. I think I see some allocations that need to be hoisted outside the loops. I’ll provide comments there.
Nevermind, I was incorrect. We will reproduce these results internally and track it down. Thanks again!
After loooking into this we think it's more likely to be a change not related to nki or the allocation apis but rather a change for that path in the compiler or runtime.
I installed the latest version of Neuron using the
.deb
and.whl
files @aws-serina-tan sent me. This version degrades the performance of my matrix multplication (currently being reviewed here).Using Neuron 2.20 (the "Deep Learning AMI Neuron (Ubuntu 22.04) 20240927" AMI launched today), my latency distribution looks like:
With these newly installed files, my latency distribution now looks like:
I tried this with and without the
--disable-dge
flag. This flag had no effect.