Closed artofscience closed 2 years ago
Sorry for the late reply, but didn't get much time to look into this. I did some basic profiling, but nothing major shows up. It seems the majority of the time is within get_newton_direction
and get_residual
. For the first part, there is the Alam
computation for some, that is costly, but I didn't see a clear approach to simplify those operations, as it just seems a dot product that has to be performed.
For the get_residual
(and also at some other places) the evaluation of g, dg, ddg
appear as well. I have the impression this is due to their allocations on every call. For instance, for the used problem Problems.square.py
def g(self, x):
return np.array([np.dot(x, x), 1-np.sum(x)])
def dg(self, x):
return np.array([2*x, -np.ones_like(x)])
def ddg(self, x):
return np.array([2*np.ones_like(x), np.zeros_like(x)])
Every call returns a new array, although for g
this is less of a problem as it is simply a scalar. Possibly an improvement here, would be to provide a out=array
argument, similar to various numpy operations to allow these functions to write in already allocated arrays.
def dg(self, x, out=None):
if out is None:
return np.array(...)
else:
out[:, 1], out[:, 2] = 2*x, -1
return out
And then change the corresponding lines within the solvers to use the already allocated arrays on subsequent calls to g, dg, ddg
. This probably only influence the timing for larger problems though
Issue 1 Line 111 of interior_point.py takes 13% of the time.
rnew = np.linalg.norm([np.linalg.norm(i) for i in self.r])
A norm of a norm is different than an norm of a concatenated array, which is what happens in Svanberg's IP: hstack(...)
and then sqrt(dot(..))
. Stacking and then taking the norm might also be faster.
Issue 2 Line 151 of interior_point_x.py
A = diags(diag_lambda) + dg[1:].dot(diags(1/diag_x) * dg[1:].transpose())
Two sparse (diagonal) matrices are created. This line took 0.97s. The final result is a dense matrix, so it doesn't make sense to make it sparse.
I changed it to
A = np.diag(diag_lambda) + np.einsum("ki,i,ji->kj", dg[1:], 1/diag_x, dg[1:])
Now it takes only 0.053s.
Issue 3 Calculating the residual, interior_point_x.py, line 117 needs the derivatives dg. Now they are calculated two times in the line
self.r[0] = self.dg(self.w[0])[0] + self.w[3].dot(self.dg(self.w[0])[1:]) - self.w[1] + self.w[2]
I've changed it to
dg_cur = self.dg(self.w[0])
self.r[0] = dg_cur[0] + self.w[3].dot(dg_cur[1:]) - self.w[1] + self.w[2]
Time before = 2.44s, time after = 1.31 s
Conclusion Issue 2 and 3, I've pushed to the dev branch. With these updates, we already saved a couple of seconds. Before ~6-7 seconds, after ~4-5 seconds. I think with Max's suggestion we can save even more! (I only applied the changes to interior_x)
Sorry for the late reply, but didn't get much time to look into this. I did some basic profiling, but nothing major shows up. It seems the majority of the time is within
get_newton_direction
andget_residual
. For the first part, there is theAlam
computation for some, that is costly, but I didn't see a clear approach to simplify those operations, as it just seems a dot product that has to be performed.For the
get_residual
(and also at some other places) the evaluation ofg, dg, ddg
appear as well. I have the impression this is due to their allocations on every call. For instance, for the used problemProblems.square.py
def g(self, x): return np.array([np.dot(x, x), 1-np.sum(x)]) def dg(self, x): return np.array([2*x, -np.ones_like(x)]) def ddg(self, x): return np.array([2*np.ones_like(x), np.zeros_like(x)])
Every call returns a new array, although for
g
this is less of a problem as it is simply a scalar. Possibly an improvement here, would be to provide aout=array
argument, similar to various numpy operations to allow these functions to write in already allocated arrays.def dg(self, x, out=None): if out is None: return np.array(...) else: out[:, 1], out[:, 2] = 2*x, -1 return out
And then change the corresponding lines within the solvers to use the already allocated arrays on subsequent calls to
g, dg, ddg
. This probably only influence the timing for larger problems though
@MaxvdKolk nice idea. Is it possible for you to try this out and check for a large problem whether this is a substantial saving?
Sure, I probably have some time before Friday to test whether or not this has any influence on the overall performance.
@MaxvdKolk what is the status of this issue wrt adding out=None? I assume this is still relevant, but is this the speed bottleneck now?
I converged to get the solvers working. Right now we have the reference solver: SvanbergIP and 3 own written interior-point solvers, subsequently
test_square.py
compares the different solvers for a simple problem.@MaxvdKolk