coil currents don't change much when using `FixSumCoilCurrent`

ddudt commented 3 months ago

@dpanici has mentioned this issue offline, but I want to document it:

The linear constraint FixSumCoilCurrent does what it is intended to do, but when used in an optimization does not allow the coil currents to change much relative to their nominal values on the order of 1e6 Amps. I have noticed that using the nonlinear constraint ToroidalFlux allows the coil currents to change much more and gives better results in practice.

When the linear constraint is included, factorize_linear_constraints normalizes the optimization variables x_reduced to order unity, while the particular solution xp is order 1e6 so that the full state vector x_full = xp + Z @ x_reduced has the proper magnitude in Amperes. The null space matrix Z is also order unity, so even large changes in the optimization space result in very small changes to the true coil currents.

A hacky solution would be to redefine the units of the coil currents to be in MA instead of Amps. Alternatively, it would be nice if the rows of Z were scaled proportionally to xp, such that relative changes in the optimization variables x_reduced yield similar relative changes in the full state vector x_full.

ddudt commented 3 months ago

On a related note, I'm skeptical that we don't need to use the coil current constraints for finite-beta equilibria. Using the final W7-X solution in the coil_stage_two_optimization.ipynb tutorial for example:

grid = LinearGrid(rho=np.array([1.0]), M=eq.M_grid, N=eq.N_grid, NFP=eq.NFP, sym=eq.sym)
linking_current = 2 * np.pi * eq.compute("G", grid=grid)["G"][0] / mu_0

print(linking_current)  # -8.2e7 A
print(sum(optimized_coilset2.coils[1].current))  # -3.0e5 A

This coil set has relatively low field errors and that is still two orders of magnitude off. Maybe if the $B\cdot\hat{n}$ errors were perfectly zero it would work, but it appears that in practice the linking current constraint is not implicitly satisfied.

f0uriest commented 3 months ago

not really sure what you mean by

factorize_linear_constraints normalizes the optimization variables x_reduced to order unity,

ddudt commented 3 months ago

not really sure what you mean by

factorize_linear_constraints normalizes the optimization variables x_reduced to order unity,

factorize_linear_constraints uses the "scaled" versions of the objectives to compute the linear system of equations:

A = constraint.jac_scaled(x0)
b = -constraint.compute_scaled_error(x0)

For example, say we have two coils each with a current of 1e6 Amps, and we want to constraint the sum of their currents:

coils = CoilSet.linspaced_angular(FourierPlanarCoil(1e6), n=2)
objective = ObjectiveFunction(CoilLength(coils))
constraint = ObjectiveFunction(FixSumCoilCurrent(coils))
objective.build()
constraint.build()
xp, A, b, Z, unfixed_idx, project, recover = factorize_linear_constraints(objective, constraint)

This yields the following: xp = [1e6, 1e6] A = [1e-6, 1e-6] b = [2] Z = [-0.707; 0.707]

Then the initial optimization variables in this case will be x_reduced = Z.T @ (x - xp) = [0, 0], and are "normalized" in the sense that they are small numbers instead of order 1e6. This is problematic since the optimizer will then take step sizes relative to the scale of x_reduced, but that yields changes in the currents on the order of Amps when we want changes on the order of MA.

The "real" parameters are recovered as x = xp + Z @ x_reduced. The null space matrix Z is always order unity by construction, so the only way for the optimizer to take large step sizes relative to the true scale of the variables is if the optimization variables x_reduced have the same order of magnitude as the real parameters x. In this example, that means we either need x_reduced to be order 1e6 or we can normalize the units of the real parameters x.

f0uriest commented 3 months ago

I think this might be as simple as redefining things as

x = xp + Z @ D @ x_reduced
x_reduced = D^-1 @ Z.T @ (x - xp)

or

x = xp + D @ Z @ x_reduced
x_reduced = Z.T @ D^-1 @ (x - xp)

where D is some diagonal scaling matrix. In the first case D corresponds to the reduced variables, in the second case to the full variables, you can sort of convert between them like we do in the perturbations: https://github.com/PlasmaControl/DESC/blob/5b245a477df1caaacf6eb06212f4d7f710ea56e2/desc/perturbations.py#L285-L287

ddudt commented 3 months ago

I may have found an additional wrinkle to this problem, or else I'm not understanding something.

As the simplest possible example, say we have a CoilSet with 2 coils of type FourierRZCoil with resolution n=0. So each coil only has 3 optimizable parameters: R_n, Z_n, and current (all scalars in this example). The full state vector for the coil set is then 6 parameters in the following order:

x = [R1_n, Z1_n, current1, R2_n, Z2_n, current2]

The linear constraint matrix corresponding to the FixSumCoilCurrent objective is simply a single row such that A @ x represents current1 + current2:

A = np.array([[0, 0, 1., 0, 0, 1.]])

Then the null space matrix that determines the optimization variables is the following matrix of shape (6, 5):

Ainv, Z = svd_inv_null(A)
print(Z)
[[ 0.         -0.70710678  0.          0.         -0.70710678]
 [ 1.          0.          0.          0.          0.        ]
 [ 0.          0.5         0.          0.         -0.5       ]
 [ 0.          0.          1.          0.          0.        ]
 [ 0.          0.          0.          1.          0.        ]
 [ 0.         -0.5         0.          0.          0.5       ]]

It is apparent that the coil currents current1 and current2 are functions of the optimization variables x_reduced[1] and x_reduced[4]. This is odd, because with 2 degrees of freedom and 1 linear constraint, they should be functions of a single variable. Even more strange, the parameter R1_n is also a function of these same two optimization variables when it should have no relation to the coil currents.

Interestingly, if we were to redefine the order of the full state vector such that the coil current is listed first, we get the following result:

# x = [current1, R1_n, Z1_n, current2, R2_n, Z2_n]
A = np.array([[1., 0, 0, 1., 0, 0]])
Ainv, Z = svd_inv_null(A)
print(Z)
[[ 0.          0.         -0.70710678  0.          0.        ]
 [ 1.          0.          0.          0.          0.        ]
 [ 0.          1.          0.          0.          0.        ]
 [ 0.          0.          0.70710678  0.          0.        ]
 [ 0.          0.          0.          1.          0.        ]
 [ 0.          0.          0.          0.          1.        ]]

This makes much more sense: the two coil currents are functions of a single optimization variable, and all of the curve parameters are independent variables.

Why does the order of our state vector matter, and how can we consistently get the behavior we want? Is this an artifact of the SVD algorithm? Is there a solution besides changing the hard-coded order of the optimizable parameters? Please help.

f0uriest commented 3 months ago

In general the SVD is unique up to a permutation of the columns of $U$ and $V$. If we enforce that the singular values are ordered in decreasing magnitude, this gives a unique decomposition as long as there are no repeated singular values. However in this case, we expect there to be duplicates (ie, there are 5 singular values at 0 because the null space is 5D). In that case it's not unique, we're effectively free to rotate the subspace spanned by the repeated singular vectors any way we like, so if we permute the columns of $A$ the resulting $Z$ will generally not just be a simple permutation of the original $Z$ (though there is some orthogonal matrix that connects the two).

That said, can't we just apply any scaling to the full state vector so that the order of the reduced vector doesn't matter (assuming we scale everything to O(1))

ddudt commented 3 months ago

OK that probably explains it. But I still don't think this can be completely solved by scaling the variables. Because in my example above, the original ordering of the state vector results in an unphysical optimization vector x_reduced even if the null space matrix is mathematically correct. I mean "unphysical" in the sense that the coil currents were not independent of the geometric curve parameters.

Also I can't figure out how to weight the full state vector instead of the reduced variables. Because A @ D @ Z = 0 is not generally true, while A @ Z @ D = 0 will always be satisfied. So in your previous post, I needed to go with the first option instead of the second option, with D corresponding to x_reduced. But if we can rescale the full state vector to O(1) then yes, I think that would help solve this problem.

dpanici commented 3 months ago

Redefine x as D@x and then do Z as null space of A@D instead of A

YigitElma commented 2 months ago

Why does the order of our state vector matter, and how can we consistently get the behavior we want?

Maybe this is different than your original scaling problem but can we add a logic to do what you want like this

A = np.array([[0, 0, 1., 0, 0, 1.]])

# This for general loop, actually for this case we can just use the A for v0
# Find the row of A which has two non-zero elements 
# (we need to change this to find 2 ones in a row)
v0 = A[np.where(np.count_nonzero(A, axis=1) == 2)[0][0]]
idx_second1 = np.where(v0 == 1)[0][1]
# set second 1 as -1 to satify A@v0=0
v0[idx_second1] = -1
# normalize v0
v0 = v0 / np.linalg.norm(v0)
# delete the column of A which has the second 1
A = np.delete(A, idx_second1, axis=1)
Ainv, Z = svd_inv_null(A)
# we have to add a zero row to Z to make it compatible with the original A
Z = np.vstack((Z, np.zeros((1, Z.shape[1]))))
Z = np.hstack((Z, v0[:, np.newaxis]))  
print(Z)

So, in factorize_linear_constraints we take the fixed parameters out manually. This code is trying to do find one basis vector of the null space manually. I think we can even generalize it for multiple ones on a row. I was also looking at the factorize_linear_constraints function to make Poincare BC branch more stable. If you guys want, I can work on something like this. Though this is basically what svd_inv_null is doing on the background?

PlasmaControl / DESC

coil currents don't change much when using `FixSumCoilCurrent` #1155