algorithmsbooks / optimization

Errata for Algorithms for Optimization book
67 stars 16 forks source link

Figure 7.10 Nelder-Mead visualized in two dimensions #56

Closed dylan-asmar closed 2 years ago

dylan-asmar commented 2 years ago

I feel like the arrows depicted in the first three operations are a bit misleading. It appears the new points x_r, x_e, x_c are originating from x_bar and x_bar is moving in the operations. However, the x_h point is the one that is adjusted, just in reference to x_bar. I think showing the arrow originating from x_h and going to the new points would help show this process (like in the shrinkage operation image).

tawheeler commented 2 years ago

Hello Dylan,

Thank you for reaching out.

If we look at the reflection operation, we have: x_r = x_m + a (x_m - x_h)

This operation can be interpreted in multiple ways. The interpretation shown in figure 7.10 is that we take x_m and modify it by a (x_m - x_h), thus drawn with an arrow from x_m to x_r. That would involve reading the equation as x_r = x_m + (a (x_m - x_h)). The other two operations are similarly small deltas applied to x_m.

If I understand correctly, the confusion is that an iteration of Nelder-mead is looking for a better value, which involves shifting to a new, better point. In that case, showing how the arrows move to a new point would indeed be a better visualization. However, in Fig 7.10 we are showing how the candidate points are constructed (i.e. the Nelder-Mead simplex operations), and not a full iteration of the algorithm.

dylan-asmar commented 2 years ago

I understand what you mean by the interpretation and I think that description makes sense. The diagrams just seemed a bit odd when it looks like we add points to a simplex.

If that is the interpretation of the equation the diagram is aiming for it conflicts a bit with the description of the operations a few pages prior. There, the wording is: "x_r = x_m + α(x_m - x_h), reflects the highest-valued point over the centroid." This wording guides me to the interpretation of the new point evaluation being in relation to the highest-valued point being reflected over the centroid versus a modification of x_m.

tawheeler commented 2 years ago

The diagrams just seemed a bit odd when it looks like we add points to a simplex.

Sorry - we're not adding the points to the simplex (and don't draw a new simplex outline) - the figure just shows how we generate the candidate points.

You are right that "reflects the highest-valued point over the centroid" would be better represented in another way. However, that interpretation would benefit from using a different mathematical representation: x_r = x_h + (1+a)(x_m - x_h) (if I did that right). I find this one a bit messier, and didn't use it in the actual code.

Unfortunately, changing the text "reflects the highest-valued point over the centroid" to "moves the centroid away from higher values, but less than expansion" would also be confusing, because the operation is called reflection.

Both approaches are equally valid. The highest-value reflection interpretation is useful for understanding why this expansion might be useful - we are moving away from higher values toward lower values. The modification-to-centroid interpretation is useful for seeing it as a simpler update (and is also a move toward lower values).