Some minor changes based on performance profiling using the tests/problems/test_square.py problem running only the ipxyz variant.
Measurements on 8d982e23666a42530ef44f1f5560cf9a7eee89e7
# n = 20, for 20 iterations
Benchmark #1: python tests/problems/test_square.py
Time (mean ± σ): 362.2 ms ± 3.6 ms [User: 457.6 ms, System: 1051.8 ms]
Range (min … max): 359.3 ms … 370.7 ms 10 runs
# n = 2000, for 20 iterations
Benchmark #1: python tests/problems/test_square.py
Time (mean ± σ): 1.319 s ± 0.010 s [User: 1.462 s, System: 1.187 s]
Range (min … max): 1.302 s … 1.333 s 10 runs
# n = 20000, for 20 iterations
Benchmark #1: python tests/problems/test_square.py
Time (mean ± σ): 8.163 s ± 0.124 s [User: 12.904 s, System: 31.740 s]
Range (min … max): 7.954 s … 8.397 s 10 runs
Measurements on 50b8a5cfe168fe07360f8deb93642cd275ec6d33
# n = 20, for 20 iterations
Benchmark #1: python tests/problems/test_square.py
Time (mean ± σ): 304.4 ms ± 3.0 ms [User: 365.4 ms, System: 840.7 ms]
Range (min … max): 300.5 ms … 310.0 ms 10 runs
# n = 2000, for 20 iterations
Benchmark #1: python tests/problems/test_square.py
Time (mean ± σ): 967.5 ms ± 9.1 ms [User: 1.116 s, System: 1.204 s]
Range (min … max): 951.5 ms … 980.3 ms 10 runs
# n = 20000, for 20 iterations
Benchmark #1: python tests/problems/test_square.py
Time (mean ± σ): 5.840 s ± 0.034 s [User: 9.251 s, System: 22.692 s]
Range (min … max): 5.796 s … 5.908 s 10 runs
The improvement ranges from 10-30% or so, depending on the problem size.
Changing np.broadcast_to to np.reciprocal within MMA avoids broadcasting and some indexing by combining the where= and out= optional arguments. The first call now allocates the arrays, leaving ~self.positive undefined, which is then updated by the second call in place. We could save some more allocations by storing these arrays within the classes, however, when doing that I ran into some bugs, so I have left that for another time for now.
Secondly, I now added g_dg and g_dg_ddg to bundle the computation of g, dg, ddg calls. The evaluation of the approximated problem is repeated in these, and it seems typical that all of them are computed at the same time. So, combining these calls saves some repeated evaluations. The code is not the cleanest, so we might want to think about that structure a bit more.
Finally, I also store the returned arrays by the problem, in this case Square. I will play around a bit more with this structure, as it would be great if we can layout the Problem in such a way that this is done automatically. For large problems it is quite costly to reallocate the design variables and sensitivity vectors every iteration (where I am assuming that Python does not do any smart tricks by reusing the allocations and is simply dropping the memory and reallocating. If it does do something smarter, this might have little benefits).
Maybe @artofscience and @aatmdelissen you can take a brief look. I will try to clean it up more before actually merging into dev though.
Some minor changes based on performance profiling using the
tests/problems/test_square.py
problem running only theipxyz
variant.Measurements on 8d982e23666a42530ef44f1f5560cf9a7eee89e7
Measurements on 50b8a5cfe168fe07360f8deb93642cd275ec6d33
The improvement ranges from 10-30% or so, depending on the problem size.
Changing
np.broadcast_to
tonp.reciprocal
within MMA avoids broadcasting and some indexing by combining thewhere=
andout=
optional arguments. The first call now allocates the arrays, leaving~self.positive
undefined, which is then updated by the second call in place. We could save some more allocations by storing these arrays within the classes, however, when doing that I ran into some bugs, so I have left that for another time for now.Secondly, I now added
g_dg
andg_dg_ddg
to bundle the computation ofg, dg, ddg
calls. The evaluation of the approximated problem is repeated in these, and it seems typical that all of them are computed at the same time. So, combining these calls saves some repeated evaluations. The code is not the cleanest, so we might want to think about that structure a bit more.Finally, I also store the returned arrays by the problem, in this case
Square
. I will play around a bit more with this structure, as it would be great if we can layout theProblem
in such a way that this is done automatically. For large problems it is quite costly to reallocate the design variables and sensitivity vectors every iteration (where I am assuming that Python does not do any smart tricks by reusing the allocations and is simply dropping the memory and reallocating. If it does do something smarter, this might have little benefits).Maybe @artofscience and @aatmdelissen you can take a brief look. I will try to clean it up more before actually merging into
dev
though.