Support richer set of basis in polynomials

RobotLocomotion / drake

Model-based design and verification for robotics.

https://drake.mit.edu

Other

3.19k stars 1.25k forks source link

Support richer set of basis in polynomials #13602

Open hongkai-dai opened 4 years ago

hongkai-dai commented 4 years ago

This comes from a discussion with @TobiaMarcucci

Currently our Polynomial class is written as a linear combination of monomials ∏xᵢᵈⁱ. As mentioned in the previous literature, monomials are "bad" bases for polynomials. Instead, we can consider other orthogonal basis such as Chebyshev, Lagrange, Gegenbauer polynomials (section 3.1.5 of https://sites.math.washington.edu/~thomas/frg/frgbook/SIAMBookFinalvNov12-2012.pdf).

To do this, we should provide classes like ChebyshevPolynomial, together with the supported operations on these polynomials (addition, subtraction, multiplication, etc).

hongkai-dai commented 4 years ago

I am trying to understand how to use Chebyshev as basis in SOS verification, by reading two papers

The master thesis of Pablo's student https://dspace.mit.edu/bitstream/handle/1721.1/39210/85843740-MIT.pdf?sequence=2&isAllowed=y
Lofberg's 2004 paper http://www.mit.edu/~parrilo/pubs/files/LofbergParrilo-FromCoefficientsToSamplesANewApproachToSOSOptimization.pdf

The conclusion of these papers are that using Chebyshev polynomial in the primal, and Lagrange polynomial in the dual is numerically good.

I have several questions after reading the paper:

Most of our systems are not polynomials in the first place (for example, our dynamics usually contain sin and cos functions). How can we approximate a non-polynomial function using Chebyshev polynomials? Theoretically one could try to use Chebyshev expansion, but the coefficients cannot be computed in the closed form (equation 3.67 of https://archive.siam.org/books/ot99/OT99SampleChapter.pdf), that the coefficient cₖ = 2/π ∫₀^π f(cos(θ))cos(kθ)dθ.
Chebyshev polynomial seems to work well in the range [-1, 1], but can explode very quickly outside of this region. So should we adjust the range of the indeterminates, such the region we verify is approximately near [-1, 1]? For example if we want to verify for indeterminates x ∈ [a, b], should we transform the indeterminates to y = (2x - (a+b)) / (b-a), such that y is in the range of [-1, 1]? But what if we do not know the exact region we care about (for example, when we want to search for the region of attraction). Moreover, this linear transformation only works for univariete polynomial, for multi-variate polynomial, I am not sure if it is a good idea to scale each indeterminate separately.
I don't think I understand how to use Lagrange polynomial in the dual case. The constraint we want is p(xᵢ) = v(xᵢ)ᵀQv(xᵢ) ∀i=1,...,2d+1, Q is positive definite. p(x) is a polynomial of degree 2d. v(x) are some given polynomials with degree no larger than d, and xᵢ is a sample point. If I understand correctly, p(x) is written as a linear combination of Chebyshev polynomials, and v(x) = [v₁(x); v₂(x); ...; v_d(x)], where vⱼ(x) is the j'th degree Chebyshev polynomial. Is this understanding correct? In Li's thesis it is suggested to use Chebyshev nodes as the xᵢ, should we also use Chebyshev nodes for multi-variate polynomial?

@TobiaMarcucci @RussTedrake could you help me understand this? Thanks a lot!

hongkai-dai commented 4 years ago

I think I understand the approach better. To show that a polynomial p(x) of degree 2d is SOS, we want the following condition

p(xᵢ) = v(xᵢ)ᵀQv(xᵢ), i = 1, ..., 2d+1

when we use Lagrange polynomial as the dual. When we want to use Chebyshev polynomial in the primal, then v(x) is a vector containing all Chebyshev polynomials with degree no larger than d. v(xᵢ) is a vector containing the evaluation of all these Chebyshev polynomials at a point xᵢ. The previous literature recommends using the Chebyshev nodes as the evaluation point xᵢ.

I still have some remaining questions:

Does this approach require p(x) to use Chebyshev polynomial as bases? I would hope to use monomial basis for p(x), as it is hard to do Chebyshev expansion for a non-polynomial function to get p(x) using Chebyshev polynomial basis. Mathematically we can evaluate p(xᵢ) as a linear combination of its coefficient, no matter whether p(x) uses monomial basis or Chebyshev polynomial basis.
Chebyshev polynomial has a "nice" behavior in the range [-1, 1], but it can explode very quickly outside the range [-1, 1] (The leading coefficient of the n'th degree Chebyshev polynomial is 2ⁿ⁻¹). So if we have an estimate on the region that we are mostly interested in, should we shift the Chebyshev polynomial to that region? For example, if we want to prove that a polynomial p(x) is SOS in the range [100, 200], should we use shifted Chebyshev polynomial v(x) in the range [100, 200], and the shifted Chebyshev nodes xᵢ in this range also?

TobiaMarcucci commented 4 years ago

Sorry for the late response. My understanding is the following, but I'm still not super confident with the problem.

For the first 3 questions:

I don't know what's the best way. But I would say that approximating using Taylor expansion with monomials and then transform the result to Chebyshev basis is not so bad? The transformation between monomials and Chebyshev is ill conditioned for high degrees (say ~10?): is it frequent that one does a high-degree approximation of the dynamics of a nonlinear system? And I think the fact that one ends with big/small coefficients in the approximation is something physiological, that would show up even if we could do the Taylor expansion directly in the Chebyshev basis.
Yes, one should always scale the problem so that the region of interest is contained in the interval [-1, 1]. Pablo mentioned that whenever one can put a bounding box to your region of interest, one should use Chebyshev polynomials within that box. Even if the box ends up being a little loose. For the multivariate case, it should be OK to scale coordinates independently. Why do you think it can cause troubles? Connected to this point, another family of useful orthogonal polynomials should be the Hermite one. They should be a better basis than the monomial in case you want to solve a problem over the all space, and you cannot bound the region of interest.
Yes, I think your understanding is correct or, at least, is in line with mine. For the multivariate case, I'd say that one uses Chebyshev nodes for each coordinate independently?

For the other 2 questions:

Yes, one could use the monomial basis the same way. But in that case one would work with a family of polynomials (the monomials) which is not very good at approximating functions. The math would be the same, but I think that with the monomial basis the problem would have a worse conditioning and the result would not be as good in term of "quality" of the function we retrieve.
Yes, as the point 2 above, I think one should always shift the problem, and I think also the evaluation nodes.

Thank you very much!

hongkai-dai commented 4 years ago

Thanks Tobia!

For the multivariate case, it should be OK to scale coordinates independently. Why do you think it can cause troubles?

I am considering when the different coordinates are coupled, for example, our interested region is a very skewed ellipsoid in 2D, with the main axis being y=x and y=-x. The main axis along y=x is much longer than the main axis along y = -x line. In this case, should we still scale two coordinates separately? Or should we consider a change of coordinates, to rotate the ellipsoid by 45 degrees first, such that in the rotated coordinate, the range of interest along x axis is much larger than the range of interest along y axis?

Yes, one could use the monomial basis the same way.

Just to make it clear, I mean we choose monomial basis for p(x), but v(x) are still Chebyshev polynomials, and we evaluate p(x) and v(x) at (shifted) Chebyshev nodes. The alternative is to first do a taylor expansion to get p(x) with monomial basis, and then convert p(x) from the monomial basis to Chebyshev polynomial. I think these two approaches should have a similar performance? We already lose the nice approximation property when we do Taylor expansion (compared to using Chebyshev expansion). So even if we convert the taylor-expanded polynomial p(x) to Chebyshev polynomial, it is no different than using monomial basis for p(x) directly?

TobiaMarcucci commented 4 years ago

Regarding the scaling of the coordinates, I don't think these situations represent a big problem. If you have a problem like the one you say, you'd encounter issues with any basis function, no? The scaling of the coordinates is to my eyes something like picking the unit of measurements so that numbers have reasonable magnitude (in this case in [-1,1]). As you say, rotating the ellipsoid would lead to a better conditioned problem, but isn't that true with any basis?

I understand better what you mean now, sorry. I admit I don't know much about the approximation properties, I should study more. But if you are for example looking for a Lyapunov function, then p(x) would be - dV/dx f(x), right? If V(x) is optimized, then you would like it to be in Chebyshev basis, to avoid ill conditioning. On the other hand, if you did Taylor expansion to approximate the dynamics you have f(x) in monomial basis. My point was that it's more likely that V(x) has high degree and that might cause issues if one is working only with monomials. On the other hand, I would imagine that one rarely approximates f(x) with more than, say, a polynomial of degree 5? These are mostly conjectures however...

hongkai-dai commented 4 years ago

Thanks Tobia for the explanation. I think I have a better understanding of what I should implement. Here is a list of items to work on:

Add ShiftedChebyshevPolynomial class. This class stores the shifted Chebyshev polynomial for a given order. The class should support operations including multiplication (the result of multiplication is the summation of another two Chebyshev polynomials), converting to Polynomial (with monomial basis), and convert Polynomial (with monomial basis) to ShiftedChebyshevPolynomial.
A function to return all the Chebyshev nodes within a range.
Add a class PolynomialWithChebyshevBasis, which represent a polynomial as a linear combination of Chebyshev polynomials.
Add a new function AddSosConstraintUsingSamples(), which implements the constraint p(xᵢ) = v(xᵢ)ᵀQv(xᵢ), i = 1, ..., 2d+1, Q is psd. This function should accept p(x) being either Polynomial (with monomial basis) or PolynomialWithChebyshevBasis, and v(x) being either monomial basis or Chebyshev polynomial basis.

What do you think?

TobiaMarcucci commented 4 years ago

I think all of these make perfect sense to me. I totally agree on the "AddConstraint" part.

Regarding the various Polynomial classes, in the python code I'm writing to play myself with these things I organized things similarly, but with small differences. I've a (names are still pretty random) Vector class, from which Monomial and Chebyshev inherit. (I plan to add others classes like these in the future: Lagrange, Laguerre, Hermite etc.) Each of this represent a single element of the polynomial basis, e.g., a single monomial x1^3 x3^2 or, for the Chebyshev, T3(x1) T2(x3). Then I represent a Basis as a list of Vectors, and a Polynomial as a Basis and a list of coefficients of the same size.

I think a set up like this is a little more flexible than having a PolynomialWithXXXBasis class for each basis. But on the other hand it exposes the user to the full complexity of these things from step one, in the sense that it's a little to abstract? I'm sure you've thought about this better than me: I'm curious to know what's your opinion about this set up? (Happy to chat in person if it makes it easier.)

Regarding the steps of multiplying, differentiating, integrating a Chebyshev polynomial I found them a little tricky to get right. I'd be happy to see how you plan to set these up / give a hand if it's of any help!

hongkai-dai commented 4 years ago

Thanks Tobia,

Exactly as you mentioned, I also considered two approaches for implementing the Polynomial class. The first approach uses class inheritance, the second doesn't.

I planned to create a PolynomialBasis class (similar to your vector class), and Monomial, Chebyshev, Lagrange etc all inherit from this PolynomialBasis class. Then Polynomial class stores a mapping from PolynomialBasis to its coefficient. The problem I encounter with this approach is what you mentioned, that I cannot think of a common interface to support some operations on PolynomialBasis. For example, the differentiation of Chebyshev polynomial is not a linear combination of other Chebyshev polynomials (dTn(x) / dx = n * Un-1(x)); on the other hand, the differentiation of a monomial is still a monomial. Hence conceptually I find it hard to design a common interface for PolynomialBasis::Differentiate function. I am curious how you solve this problem.
Implement each type of Polynomial separately. I don't need to design a common interface any more, but it makes it hard to swap from one type of polynomial to another type with a different basis.

TobiaMarcucci commented 4 years ago

If you express Un-1 in terms of T0, ..., Tn-1 using the formula "with the odd and even cases" from here, don't you get that dTn(x) / dx is a linear combination of other Chebyshev polynomials?

In my implementation, the derivative of a Vector is a Polynomial, expressed in the same basis of the Vector I'm differentiating. So if I differentiate a Monomial, call it m , I get a Polynomial with coefficients = [1] and basis = [dm/dx]. If I differentiate a Chebyshev I get a more complex Polynomial, with a bunch of coefficients but with the basis that still contains Chebyshev vectors only.

I did something similar for multiplication and integration of monomials. However, following this path, I ended up having things like the product of two Monomials to return a Polynomial, which is kind of odd? (Since, again, the product of two Chebyshevs is a linear combination of Chebyshevs.)

hongkai-dai commented 4 years ago

If you express Un-1 in terms of T0, ..., Tn-1 using the formula "with the odd and even cases" from here, don't you get that dTn(x) / dx is a linear combination of other Chebyshev polynomials?

Thanks! I didn't know the equation to convert Un-1 in terms of T0, ..., Tn-1. This solves the differentiation problem I had before.

So if I differentiate a Monomial, call it m , I get a Polynomial with coefficients = [1] and basis = [dm/dx]

I think my implementation would probably be different. Say m(x) = x^2, after differentiation I will get a polynomial, with dm(x)/dx = 2x, that the coefficient is [2] and the basis is [x]. Apart from that I agree if we differentiate a basis we get a complicated polynomial with the same basis.

I think the common API for multiplication would be

std::unordered_map<PolynomialBasis, double> PolynomialBasis::multiply(const PolynomialBasis& other);

Namely if we multiply two polynomial basis (monomial multiplies a monomial, or Chebyshev multiplies a Chebyshev), we return a linear combination of the same basis.

Just curious, why you need the integration of monomials?

TobiaMarcucci commented 4 years ago

I think my implementation would probably be different. Say m(x) = x^2, after differentiation I will get a polynomial, with dm(x)/dx = 2x, that the coefficient is [2] and the basis is [x]. Apart from that I agree if we differentiate a basis we get a complicated polynomial with the same basis.

Sorry, my mistake. Differentiating a Monomial I get a Polynomial like the one you say (coefficients = [power of the variable wrt I'm differentiating]).

I agree on the multiplication.

For the integration, I need it for the objective of the SOS programs I'm working on. More specifically, the objective in the approximate DP problems is to "push up as much as possible" the lower bound to the value function. And this is done by maximizing the integral of the approximate value function (which is upper bounded by the true value function) within a region of state space of interest.

hongkai-dai commented 4 years ago

@jwnimmer-tri , may I ask you a question about code design? As Tobia and I talked above, I would like to design a base class called symbolic::PolynomialBasis, with some derived class

symbolic::Monomial
symbolic::Chebyshev
symbolic::Hermite

Then I would like to re-design our symbolic::Polynomial class (which currently only takes symbolic::Monomial as its basis). After the redesign, I hope that symbolic::Polynomial can take other types (like Chebyshev and Hermite as basis). Mathematically, we will represent the polynomial p(x) as a linear combination of its basis b(x)

p(x) = c1 * b1(x) + c2 * b2(x) + ... + cn * bn(x)

where c1, c2, ..., cn are coefficients (of type symbolic::Expression). Also I would like to require that the basis b1(x), ..., bn(x) must have the same type (all bi(x) has to be monomial, or all has to be Chebyshev, but not a mixture of Monomial and Chebyshev).

I think symbolic::Polynomial class will store a map from the basis bi(x) to its coefficient ci. But I don't think I should use map as std::unordered_map<PolynomialBasis, symboic::Expression>, as this map allows the keys to be a mixture of different derived class (like a mixture of Monomial and Chebyshev).

The alternative I can think of is to make symbolic::Polynomial a templated class, with the template parameter as one of the derived basis class.

template <type Basis>
class Polynomial {
  std::unordered_map<Basis, symbolic::Expression> basis_to_coeff_map_;
};

and we will use the polynomial class as Polynomial<Monomial> and Polynomial<Chebyshev>.

The problem of this alternative approach is that I will break the current API (the current symbolic::Polynomial class is not templated). Or I can create a new templated class GenericPolynomial, then I won't break the current symbolic::Polynomial API.

Do you think there is a better way?

TobiaMarcucci commented 4 years ago

I think I understand better what's going on between primal and dual bases. I hope the following makes sense and, if it does, I hope they aren't trivial considerations.

I find the presentation from Pablo’s book to be the clearest (Section 3.1.4). As in Theorem 3.41, we can pick the basis v1, ..., vs for nvariate polynomials up to degree d, and the basis w1, ..., wt for the dual space (dual of the nvariate polynomials up to degree 2d). Once we have made this decision, the constraint to be added to the SOS program is the formula given in Theorem 3.41.

Now, what does the generic wk look like? By definition, it is a linear functional wk(p) (or, using the notation of the book, <p, wk>) that, given the polinomial p, returns a real-valued scalar. (One random example could be wk(p) = int_0^1 p(x) dx: linear and real valued.)

Given a primal basis v1, ..., vs, one can define (citing page 67) a "corresponding dual basis" as the set of functionals for which it holds wk(vi) = δik, with δik Dirac delta. Note that, if p is expressed in some primal basis, wk from the corresponding dual basis is just defined as wk(p) = kth coefficient of p. However, in general, we might not want to work with a pair of corresponding bases for the primal and the dual: "there may be advantages (numerical or otherwise) in not doing so."

Say for example we want to use the monomial basis for the primal and the Chebyshev basis for the dual. Then v1, ..., vs are the classical monomials, whereas wk(p) takes a polynomial p and returns the coefficient of the kth Chebyshev polynomial (i.e. the kth coefficient if p was expressed in Chebyshev basis rather than in monomial basis). A way to do this, as done in Jia Li Sun thesis in Section 4.3, is to use the (ill-conditioned) linear transformation (3.24).

If this interpretation is correct, it seems that doing, e.g., primal-monomial-dual-Chebyshev doesn't make much sense: you end up writing exactly the same constraint as in the primal-Chebyshev-dual-Chebyshev case but, along the way, you have an ill-conditioned matrix inversion.

To summarize. I'd say that the constraints you send to the SDP solver would depend, in theory, only on the dual basis you employ. This choice of the dual basis is very important, since it can make equivalent constraints much nicer numerically. However, since we work with finite precision, also the choice of the primal basis turns out to be relevant in practice: an appropriate primal basis can prevent ill-conditioned transformations during the evaluations wk(p).

@hongkai-dai : Is this in line with your understanding of the problem?

hongkai-dai commented 4 years ago

@TobiaMarcucci , thanks a lot for looking into this. Pablo's book is a lot clearer. Yes I have the same understanding as yours, that primal-monomial-dual-Chebyshev doesn't make sense. It seems that we need to just make two decisions:

For primal part, what is the basis for both p(x) and v(x).
For the dual part, should we use monomial basis (compare the polynomial coefficients) or Lagrange basis (compare the evaluation of polynomial at some sample points).

I do have a question on using primal-Chebyshev-dual-Lagrange. If the region of interest is not [-1, 1], it is not clear to me whether we need to use shifted Chebyshev polynomial to scale the region of interest to [-1, 1]. Here is an example:

Say we want to prove that x >= 10 implies x³ >= 1. We will need to show that

p(x) = x³ - 1 - r(x) * (x-10) is SOS
r(x) is SOS

Say we want to use Chebyshev polynomial (not the shifted one) as the basis. We have

x³ - 1 = 1/4*T₃(x) + 3/4 * T₁(x) - T₀(x)
x - 10 = T₁(x) - 10*T₀(x)

and we write r(x) using Chebyshev basis as r(x) = c₀T₀(x) + c₁T₁(x) + c₂T₂(x)

We then have

p(x) = 1/4*T₃(x) + 3/4 * T₁(x) - T₀(x) - (c₀T₀(x) + c₁T₁(x) + c₂T₂(x)) * (T₁(x) - 10*T₀(x)) is SOS
r(x) = c₀T₀(x) + c₁T₁(x) + c₂T₂(x) is SOS

For the first SOS condition p(x) is SOS, we write it as

p(xᵢ) = [T₀(xᵢ), T₁(xᵢ)]' * Q * [T₀(xᵢ), T₁(xᵢ)], i = 1, 2, 3
Q is psd

where xᵢ are some sampled points. For the second SOS condition r(x) is SOS, we write this SOS condition as

r(xᵢ) = [T₀(xᵢ), T₁(xᵢ)]' * S * [T₀(xᵢ), T₁(xᵢ)], i = 1, 2, 3
S is psd

I think I can choose xᵢ as the Chebyshev nodes in the range [-1, 1], xᵢ = cos((2i-1)/(2n)π, since the Chebyshev polynomial are "nicely behaved" in the range [-1, 1].

It seems to me that by using Chebyshev polynomial and sample points within the range [-1, 1], the conditions above don't have severe numerical problems? So although our range of interest is [10, infinity], we don't have to evaluate the polynomial in the range of interest (namely xᵢ doesn't have to be in [10, infinity])? @TobiaMarcucci did I do something wrong here?

TobiaMarcucci commented 4 years ago

For the dual part, should we use monomial basis (compare the polynomial coefficients) or Lagrange basis (compare the evaluation of polynomial at some sample points).

One comment regarding this: my understanding is a little different I think. By saying that we use the monomial basis for the dual, don't we mean that we compare the coefficients of the monomials? My understanding is that if we use, e.g., Chebyshev both for the primal and the dual, we express p(x) and v(x) in Chebyshev basis (so far so good), but on the dual side we do coefficient matching, since the primal and the dual bases coincide?

I do have a question on using primal-Chebyshev-dual-Lagrange.

No, I don't see anything wrong in your reasoning. I think that for the class of problems you consider the conditioning issues can show up in the constraint r(x) is SOS. Maybe to show that p(x) is SOS over some region, you might need a multiplier r(x) that has "small" coefficients if expressed in Chebyshev basis, but has "huge" coefficients if expresses in monomial basis? I should have a better think though...

hongkai-dai commented 4 years ago

One comment regarding this: my understanding is a little different I think. By saying that we use the monomial basis for the dual, don't we mean that we compare the coefficients of the monomials? My understanding is that if we use, e.g., Chebyshev both for the primal and the dual, we express p(x) and v(x) in Chebyshev basis (so far so good), but on the dual side we do coefficient matching, since the primal and the dual bases coincide?

@TobiaMarcucci Yes you are right, sorry I wasn't clear when I said it in the first place. If we use monomial basis, then we compare the coefficients of monomials. If we use Chebyshev basis, then we compare the coefficients of Chebyshev polynomials.

As far as I read it, the advantage of using Lagrange polynomial in the dual is the low-rank property of the constraint matrix, namely the in constraint trace(Q Vᵢ) = p(xᵢ), the matrix V is rank 1 as Vᵢ = vᵢ vᵢᵀ. In the Lofberg's paper he mentions that DSDP can exploit the low rank constraint matrix. If the solver cannot exploit this low rank structure, does that mean using Lagrange dual gives no numerical advantage? I am not sure CSDP or Mosek can exploit the low rank property.

hongkai-dai commented 4 years ago

BTW, I just sent an email to Mosek asking them if they exploit the low rank property in the newest version. The answer is no.

hongkai-dai commented 4 years ago

Mosek staff replied that it is relatively easy for them to exploit low rank constraint matrices. We might expect this feature in Mosek soon.

TobiaMarcucci commented 4 years ago

Just for reference, I cleaned my implementation of the Polynomial class mentioned above. The code is here. By tomorrow or the day after, I plan to integrate it with MathematicalProgram and start to do some preliminary testing.

AlexandreAmice commented 8 months ago

I want to revive this issue, but suggest a different design pattern.

Introduce OrderedMonomialBasis which takes a set of variables and a monomial ordering. This class will have a method GetNext which given a monomial, can get the next monomial in the basis according to the monomial ordering. It will also have the method GetFirstN which will return the first n elements in the basis.
Introduce PolynomialBasis as a change of basis of an OrderedMonomialBasis. This class will also have a GetNext and GetFirstN method which must return the change of basis matrix and the ordered monomials used to generate the elements of that basis.
Define GenericPolynomial as an ordered coefficient vector and a PolynomialBasis.

This would facilitate some more algebraic computations.