ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.09k stars 230 forks source link

Simplify TensorDescriptor::GetElementSpace() #3380

Closed CAHEK7 closed 1 week ago

CAHEK7 commented 2 weeks ago

It's a tiny improvement of senseless overcomplications in TensorDescriptor::GetElementSpace(). Got ridden of two extra allocations and initializations and computed everything in a single pass.

I guess it's insignificant performance improvement for the overall library, but that function became ~1.6 times faster: Number of tests: 134217728 (1d-5d cases) New function average time (ns): 20.4021 Old function average time (ns): 32.94695 Gain (times): 1.61488

In terms of dynamically executed instructions, it's even worse: ~35.2 per call vs ~431.2 per call (including subsequent malloc/free)

I'll delete the test when CI passed, there is not much sense to check that function over the previous implementation.