artofscience / SAOR

Sequential Approximate Optimization Repository
GNU General Public License v3.0
5 stars 1 forks source link

Shape/size of problem return arguments #2

Closed MaxvdKolk closed 3 years ago

MaxvdKolk commented 3 years ago

Currently, we formulate problems as

class Problem(self, objective, constraints=[]):
    def response(self, x):
        return np.asarray([self.objective.f(x)] + [c.f(x) for c in self.constraints])

which combines the objective and responses in a single function call. This is slightly different to scipy.minimize. There, the objective function is a single scalar value, rather then a list of scalar values (i.e. the scalar objective + a number of scalar constraints). Similarly, there are some choices to be made in the arrangement of the sensitivity vector, e.g. (n, ), (n, 1), (1, n) etc.

Although these can be easily fixed by introducing some wrappers, I feel it would be good to make one choice and stick with that convention at least within all the problem definitions of the sao package.

artofscience commented 3 years ago

Imo the list/array of scalar values would be good. No clue what scipy.minimize does with this separate objective function value, but you can always extract it from the list (always first entry).

For the arrangement of vectors lets discuss, maybe: g(m,1) dg(n,m) ddg(n,n,m) or ddg(m,n,n)

Would indeed be good to make a decision, also because these arrangements will be used e.g. in the solvers!

MaxvdKolk commented 3 years ago

For the arrangement of vectors lets discuss, maybe: g(m,1) dg(n,m) ddg(n,n,m) or ddg(m,n,n)

This seems all right with me. Not sure how the sizes end up if you write it out on paper for the second derivatives ddg. For the implementation we might be able to use some scipy.sparse matrix variant that stores only the diagonals or their approximations and then arrange it such that it matches either ddg(n,n,m) or ddg(m,n,n). This could be determined by typical operations in the solvers, such that you can simply do a np.dot(ddg(), ...) without much transposing / reshaping.

I think scipy.minimize splits it to provide explicit solvers for unconstrained problems, so then putting in only the scalar valued objective function makes sense. Also, they support specific constraint classes, e.g. LinearConstraint and NonlinearConstraint.

In our case we might ultimately make something like a LinearReponse(Response) and NonLinearResponse(Response) classes that extend the original Response class? Possibly we can convey properties of the responses to the optimisers in that way. But we can think about this once there are some more example responses and basic solvers implemented.

Giannis1993 commented 3 years ago

I agree it is useful to fix the shapes now that it is still early. I would propose to use the following format that is typically used in the literature:

x(n, ) or x(n, 1) g(m+1, ) or g(m+1, 1) dg(m+1, n) (like a typical Jacobian matrix, this is what Svanberg uses as well) ddg(m+1, n) (only diagnoal 2nd order info preserves separability, so no need for full Hessian)

As for the options for x, g: on the one hand, removing the 2nd dimension reduces the storage somewhat and simplifies indexing, but on the other hand it makes it unnecessarily complicated when you want to perform some linear algebra operations (e.g. concatenate, stack, etc.)

MaxvdKolk commented 3 years ago

One thing nice about the one-dimensional arrays (n,), is that you also get this shape when calling np.asarray on a list or tuple. That makes it easy to allow responses (or other functions) to return a simple list/tuple and call np.asarray (which will not do anything if it's already an np.array I think). For two-dimensional arrays this is a bit more involved.

Not sure about the linear algebra or the stacking operations. The docs make it seem they can work fine with one-dimensional arrays, but they indeed have some specific behaviours for one-dimensional versus multi-dimensional arrays.


Another thing, the one-dimensional arrays do nicely represent their mathematical counter part, no? Ultimately they are vectors, rather than one-dimensional matrices?

Anyway, I would opt to choose the variant that provides the least friction in the code, e.g. avoids many np.newaxis-like operations.

MaxvdKolk commented 3 years ago

Based on meeting on 12/02, the conclusion seems to use (n,) for the one-dimensional arrays.