It's a tiny improvement of senseless overcomplications in TensorDescriptor::GetElementSpace().
Got ridden of two extra allocations and initializations and computed everything in a single pass.
I guess it's insignificant performance improvement for the overall library, but that function became ~1.6 times faster:
Number of tests: 134217728 (1d-5d cases)
New function average time (ns): 20.4021
Old function average time (ns): 32.94695
Gain (times): 1.61488
In terms of dynamically executed instructions, it's even worse: ~35.2 per call vs ~431.2 per call (including subsequent malloc/free)
I'll delete the test when CI passed, there is not much sense to check that function over the previous implementation.
It's a tiny improvement of senseless overcomplications in TensorDescriptor::GetElementSpace(). Got ridden of two extra allocations and initializations and computed everything in a single pass.
I guess it's insignificant performance improvement for the overall library, but that function became ~1.6 times faster: Number of tests: 134217728 (1d-5d cases) New function average time (ns): 20.4021 Old function average time (ns): 32.94695 Gain (times): 1.61488
In terms of dynamically executed instructions, it's even worse: ~35.2 per call vs ~431.2 per call (including subsequent malloc/free)
I'll delete the test when CI passed, there is not much sense to check that function over the previous implementation.