antonior92 / ip-nonlinear-solver

A trust-region interior-point method for general nonlinear programing problems (GSoC 2017).
BSD 3-Clause "New" or "Revised" License
35 stars 12 forks source link

Code style #3

Closed nmayorov closed 6 years ago

nmayorov commented 7 years ago

I was reading your code from time to time and some things drew my attention. I decided to create a separate issue for that.

  1. Describing arrays as "unidimensional". I suggest an alternative approach: mention array shapes in the type annotation (as usually done), previously introducing the notation (like n, m, etc.) in the docstring body (and often it's OK to even omit it and introduce names right in the shape description). As for the description --- think of something more specific and precise. Example:

    b : array_like, shape (m,)
    Right-hand side of the constraint equation.

    I understand that it looks like overkill somewhat, but I think it is a more intelligent way to handle things in this situation.

  2. The same goes for return arrays: writing their shape will make it easier to understand their relation to the problem formulation.

  3. This is rather pedantic: the PEP convention prescribes to use one-line summary docstrings and then go into more details if necessary. To my excuse, I was asked to make such fixes by charris back then, for me it was OK and I even managed somehow to achieve that. Even worse, in theory it should start on the same line as the triple quotes.

  4. From the same category: this story about "comment with a period" vs "comment without a period". If you have any specific reasons not to use periods --- don't bother.

  5. Sometimes you didn't use += (etc) operators when possible, it looks a bit odd to my eye. Not sure about efficiency by the way, it could make very small difference because of temporary memory allocation.

  6. Brackets around not operands in if are usually unnecessary.

  7. Another preferential thing: I don't like using len applied to numpy arrays, I would personally use shape with the required index. For me it is more clearly separates numpy structures from pure Python structures and it is more explicitly.

antonior92 commented 7 years ago

Thanks for the suggestions Nikolay. I will modify my code accordingly.

antonior92 commented 7 years ago

Just made a few changes accordingly to your comments. A few notes about the modifications:

  1. I include the shape on the docstring for all the matrices, arrays and linear operators;
  2. Done.
  3. Ok.
  4. I included punctuation on large comments. Pep allows small comments to not have punctuation, so I keep those unchanged.
  5. I accepted your suggestion on this except for a few places (e.g. line 551) where I think the notation x = x + alpha*p is actually better than x += alpha*p because of the similarity with: x_next = x + alpha*p . Also on line 365 I kept z = z - A.T.dot(v) instead of z -= A.T.dot(v) because I think it is better.
  6. Ok.
  7. Ok.
nmayorov commented 7 years ago

I accepted your suggestion on this except for a few places (e.g. line 551) where I think the notation

That is an interesting view. I don't really agree with it, because there is no mathematical equivalent to x = x + alpha*p it is an algorithmic thing, which is precisely x += alpha*p and to me it looks cleaner. Additionally you usually annotate code lines with a comment. And the final argument --- I guess no one is used to see code like this, and if someone sees it, he can get a bad impression on the code as the whole (which would be totally wrong).

But if you like it that way --- I don't insist on changing. I understand your arguments and I can see that it is more readable that way.

nmayorov commented 7 years ago

I noticed that you put your latest code in the same file. Maybe it's already the right situation for splitting?

By the way, I answered in the wiki.

antonior92 commented 7 years ago

I agree with you Nikolay. It is starting to getting to big to just one file. I will think about a good way to split the code.

nmayorov commented 7 years ago

This is quite subjective, but I find the code a bit too sparse vertically, which hurst readability in my opinion. What I mean is that it's hard to grasp logical indented blocks, because there are too much blank lines, it falls apart sort of.

The similar concept is discussed here. I think in your case such sparseness already started to hurt readability.

antonior92 commented 7 years ago

Make sense to me. I will find some time latter to remove some extra blank spaces.

nmayorov commented 7 years ago

Just read byrd_omojokun_sqp.py, some notes:

  1. I'm not sure if I like too many mentions of "Byrd-Omojokun" in the code. I understand your motivation, but it looks kind of bizarre in the context of programming. Maybe use something more general, like equality_constrained_sqp (too verbose?) and reference the algorithm's authors in the docstring. I guess it is not necessary to change, but looks a bit strange nevertheless.
  2. I suggest to consider an alternative naming strategy: using longer descriptive names for callables, like fun, fun_grad, constr, constr_jac and then get rid of k suffixes everywhere. My main motivation is that k suffixes are quite useless (and even somewhat confusing), except differentiating data variable names from callable names, so maybe it's better to get rid of them. Additionally callable names are usually longer, so it makes sense to differentiate them from simple variables by longer names here as well.
  3. Then I guess it's better to use iteration instead of k.
  4. I think ared and pred are bad names (despite being used in the book), I think actual_reduction, predicted_reduction are much better.
  5. I guess 1/10 is better written as 0.1, etc.
  6. I don't like the practice of avoiding "in-place" operations, like trust_radiusk = 1/10*trust_radiusk -> trust_radiusk *= 0.1 would be good to see. I make an argument that while it doesn't look similar to trust_radius_new = 0.1 * trust_radius, it is essentially good, because it clearly differentiates if from this other case. Reading trust_radiusk = 1/10*trust_radiusk one may wonder --- is this a bug or what?
  7. Be more consistent with named constants, like SUFFICIENT_REDUCTION is defined, but other threshold values are not. In smaller functions like default_stop_criteria named constants are indeed not necessary I guess.
  8. What tr_ prefix means intr_lb and tr_ub?
  9. I think spaces usage can be cleaned-up a bit, like reduction_ratio = ared/pred -> reduction_ratio = ared / pred, etc.
antonior92 commented 7 years ago

I do agree with most of your notes. I will do the required modifications.

About item 7, tr_ stands for trust-region. I would like to indicate tr_lb and tr_ub as the upper and lower bound for the trust-region (and not for the problem). In other words, the problem have the general problem being solved is:

minimize f(x)
subject to: c(x) = 0

with no bound constraints (at least for now).

But the subproblem being solve is:

minimize f + g.T x + 1/2 x.T H x
subject to: A x + b = 0
           || inv(S) x || < trust_radius
             tr_lb <= inv(S) x <= tr_ub

I must clarify the role of S in the docstring as well (I will do it in the next commit)

antonior92 commented 7 years ago

Just implemented your suggestions Nikolay. I think it was a great improvement on the code overall quality. Thanks :)

mdhaber commented 7 years ago

It looks like you're using the function scipy.sparse.linalg.splu in projections.py. I was just digging into scipy.sparse.linalg today and it looks like scipy.sparse.linalg.factorized would be better. It uses umfpack if available and falls back to scipy.sparse.linalg.splu otherwise. Since you have SuiteSparse, would you get the umfpack scikit if you don't already have it and try factorized? The consensus online seems to be that it is typically faster.

antonior92 commented 7 years ago

It makes sense. I will change it...

Thanks for noticing this!

nmayorov commented 7 years ago

Read through ipsolver:

  1. I suggest to follow the notation consistently which you introduced in opt_problems and which is really nice in my opinion:
    • fun, grad, hess, lagr_hess.
    • constr_eq, jac_eq, hess_eq.
    • constr_ineq, jac_ineq, hess_ineq.
  2. I guess it might be better to introduce a short notation for barrier parameter as mu. After all there are plenty of such math-like variables (like s for slack variables).
  3. Really don't mind, but maybe calling x.shape looks nicer then np.shape(x)?
  4. I think that maybe you should remove on pair of [] in your "vector notation" in docstrings.
  5. Saw it in several places: if returning True and False. Better to it rewrite as a single return?
  6. barrier_parameter and tolerance probably should be updated with *=.

Overall I like how easy and concise it looks.

One thing: it seems to me that you always lean towards sparse matrices / LinearOperators. Doesn't it contradict with that we wanted to allow solving dense problems as well? If you feel like that this is unnatural (or too much trouble) for this solver, then perhaps we should "officially" drop this option?

antonior92 commented 7 years ago
  1. You are absolutely right, my notation for nonlinear problems is not consistent at all, I am using like four different notation in different scopes of the program. I will try to uniformize it to what I have done in opt_problems.
  2. My tendency is to leave the barrier parameter name as something more informative asbarrier_parameter. Mostly because the notation regarding it is not uniform among the references. I am trying to use mathematical notation when the symbol is well established (For instance, the slack variables are called s in all the references).
  3. Ok
  4. Ok
  5. Ok
  6. Ok.

About sparse/dense matrices.

The Hessian can be a sparse matrix, a dense matrix or a LinearOperator, and my implementation is dealing with each of them in the best possible way.

The major decision we have to face is how to deal with the Jacobian matrix. So far the implementation converts it to an sparse matrix always.

I don't know yet how to proceed about this. In presence of inequality nonlinear constraints (or upper and lower bounds in the variables) the Jacobian matrix will have a lot of zeros and the sparse representation is actually appropriate, even if a dense structure is passed by the user.

I don't think that is a good alternative to use a dense structure every time dense matrices are provided (since sparse structures can arise nevertheless). The only case when it is obvious better to use a dense structure, is when only equality constraints are present. For other cases this is not obvious at all. Maybe the best alternative is to provide an dense/sparse Jacobian option.

antonior92 commented 7 years ago
  1. You are absolutely right, my notation for nonlinear problems is not consistent at all, I am using four different notation in different scopes of the program. I will try to uniformize it to what I have done in opt_problems.
  2. My tendency is to leave the barrier parameter name as something more informative asbarrier_parameter. Mostly because the notation regarding it is not uniform among the references. I am trying to use mathematical notation when the symbol is well established (For instance, the slack variables are called s in all the references).
  3. Ok
  4. Ok
  5. Ok
  6. Ok.

About sparse/dense matrices:

The Hessian can be a sparse matrix, a dense matrix or a LinearOperator. The implementation is dealing with each of them in the best possible way.

The major decision we have to face is how to deal with the Jacobian matrix. So far the implementation converts it to an sparse matrix always.

I don't know yet how to proceed about this. In presence of inequality nonlinear constraints (or upper and lower bounds in the variables) the Jacobian matrix will have a lot of zeros and the sparse representation is actually appropriate, even if a dense structure is passed by the user.

I don't think that is a good alternative to use a dense structure every time dense matrices are provided (since sparse structures can arise nevertheless). The only case when it is obviously better to use a dense structure, is when only equality constraints are present. For other cases this is not obvious at all. Maybe the best alternative is to provide an dense_sparse_jacobian option.