Open hongkai-dai opened 4 years ago
I am trying to understand how to use Chebyshev as basis in SOS verification, by reading two papers
The conclusion of these papers are that using Chebyshev polynomial in the primal, and Lagrange polynomial in the dual is numerically good.
I have several questions after reading the paper:
cₖ = 2/π ∫₀^π f(cos(θ))cos(kθ)dθ
. [-1, 1]
, but can explode very quickly outside of this region. So should we adjust the range of the indeterminates, such the region we verify is approximately near [-1, 1]
? For example if we want to verify for indeterminates x ∈ [a, b], should we transform the indeterminates to y = (2x - (a+b)) / (b-a), such that y
is in the range of [-1, 1]? But what if we do not know the exact region we care about (for example, when we want to search for the region of attraction). Moreover, this linear transformation only works for univariete polynomial, for multi-variate polynomial, I am not sure if it is a good idea to scale each indeterminate separately.p(x)
is a polynomial of degree 2d
. v(x)
are some given polynomials with degree no larger than d
, and xᵢ is a sample point. If I understand correctly, p(x)
is written as a linear combination of Chebyshev polynomials, and v(x) = [v₁(x); v₂(x); ...; v_d(x)]
, where vⱼ(x)
is the j'th degree Chebyshev polynomial. Is this understanding correct? In Li's thesis it is suggested to use Chebyshev nodes as the xᵢ, should we also use Chebyshev nodes for multi-variate polynomial?@TobiaMarcucci @RussTedrake could you help me understand this? Thanks a lot!
I think I understand the approach better. To show that a polynomial p(x)
of degree 2d
is SOS, we want the following condition
p(xᵢ) = v(xᵢ)ᵀQv(xᵢ), i = 1, ..., 2d+1
when we use Lagrange polynomial as the dual. When we want to use Chebyshev polynomial in the primal, then v(x)
is a vector containing all Chebyshev polynomials with degree no larger than d
. v(xᵢ)
is a vector containing the evaluation of all these Chebyshev polynomials at a point xᵢ
. The previous literature recommends using the Chebyshev nodes as the evaluation point xᵢ
.
I still have some remaining questions:
p(x)
to use Chebyshev polynomial as bases? I would hope to use monomial basis for p(x)
, as it is hard to do Chebyshev expansion for a non-polynomial function to get p(x)
using Chebyshev polynomial basis. Mathematically we can evaluate p(xᵢ)
as a linear combination of its coefficient, no matter whether p(x)
uses monomial basis or Chebyshev polynomial basis. n
'th degree Chebyshev polynomial is 2ⁿ⁻¹). So if we have an estimate on the region that we are mostly interested in, should we shift the Chebyshev polynomial to that region? For example, if we want to prove that a polynomial p(x) is SOS in the range [100, 200], should we use shifted Chebyshev polynomial v(x)
in the range [100, 200], and the shifted Chebyshev nodes xᵢ
in this range also?Sorry for the late response. My understanding is the following, but I'm still not super confident with the problem.
For the first 3 questions:
I don't know what's the best way. But I would say that approximating using Taylor expansion with monomials and then transform the result to Chebyshev basis is not so bad? The transformation between monomials and Chebyshev is ill conditioned for high degrees (say ~10?): is it frequent that one does a high-degree approximation of the dynamics of a nonlinear system? And I think the fact that one ends with big/small coefficients in the approximation is something physiological, that would show up even if we could do the Taylor expansion directly in the Chebyshev basis.
Yes, one should always scale the problem so that the region of interest is contained in the interval [-1, 1]. Pablo mentioned that whenever one can put a bounding box to your region of interest, one should use Chebyshev polynomials within that box. Even if the box ends up being a little loose. For the multivariate case, it should be OK to scale coordinates independently. Why do you think it can cause troubles? Connected to this point, another family of useful orthogonal polynomials should be the Hermite one. They should be a better basis than the monomial in case you want to solve a problem over the all space, and you cannot bound the region of interest.
Yes, I think your understanding is correct or, at least, is in line with mine. For the multivariate case, I'd say that one uses Chebyshev nodes for each coordinate independently?
For the other 2 questions:
Yes, one could use the monomial basis the same way. But in that case one would work with a family of polynomials (the monomials) which is not very good at approximating functions. The math would be the same, but I think that with the monomial basis the problem would have a worse conditioning and the result would not be as good in term of "quality" of the function we retrieve.
Yes, as the point 2 above, I think one should always shift the problem, and I think also the evaluation nodes.
Thank you very much!
Thanks Tobia!
For the multivariate case, it should be OK to scale coordinates independently. Why do you think it can cause troubles?
I am considering when the different coordinates are coupled, for example, our interested region is a very skewed ellipsoid in 2D, with the main axis being y=x
and y=-x
. The main axis along y=x
is much longer than the main axis along y = -x
line. In this case, should we still scale two coordinates separately? Or should we consider a change of coordinates, to rotate the ellipsoid by 45 degrees first, such that in the rotated coordinate, the range of interest along x axis is much larger than the range of interest along y axis?
Yes, one could use the monomial basis the same way.
Just to make it clear, I mean we choose monomial basis for p(x)
, but v(x)
are still Chebyshev polynomials, and we evaluate p(x)
and v(x)
at (shifted) Chebyshev nodes. The alternative is to first do a taylor expansion to get p(x)
with monomial basis, and then convert p(x)
from the monomial basis to Chebyshev polynomial. I think these two approaches should have a similar performance? We already lose the nice approximation property when we do Taylor expansion (compared to using Chebyshev expansion). So even if we convert the taylor-expanded polynomial p(x)
to Chebyshev polynomial, it is no different than using monomial basis for p(x)
directly?
Regarding the scaling of the coordinates, I don't think these situations represent a big problem. If you have a problem like the one you say, you'd encounter issues with any basis function, no? The scaling of the coordinates is to my eyes something like picking the unit of measurements so that numbers have reasonable magnitude (in this case in [-1,1]). As you say, rotating the ellipsoid would lead to a better conditioned problem, but isn't that true with any basis?
I understand better what you mean now, sorry. I admit I don't know much about the approximation properties, I should study more. But if you are for example looking for a Lyapunov function, then p(x)
would be - dV/dx f(x)
, right? If V(x)
is optimized, then you would like it to be in Chebyshev basis, to avoid ill conditioning. On the other hand, if you did Taylor expansion to approximate the dynamics you have f(x)
in monomial basis. My point was that it's more likely that V(x)
has high degree and that might cause issues if one is working only with monomials. On the other hand, I would imagine that one rarely approximates f(x)
with more than, say, a polynomial of degree 5? These are mostly conjectures however...
Thanks Tobia for the explanation. I think I have a better understanding of what I should implement. Here is a list of items to work on:
ShiftedChebyshevPolynomial
class. This class stores the shifted Chebyshev polynomial for a given order. The class should support operations including multiplication (the result of multiplication is the summation of another two Chebyshev polynomials), converting to Polynomial
(with monomial basis), and convert Polynomial
(with monomial basis) to ShiftedChebyshevPolynomial
.PolynomialWithChebyshevBasis
, which represent a polynomial as a linear combination of Chebyshev polynomials.AddSosConstraintUsingSamples()
, which implements the constraint p(xᵢ) = v(xᵢ)ᵀQv(xᵢ), i = 1, ..., 2d+1, Q is psd
. This function should accept p(x)
being either Polynomial (with monomial basis) or PolynomialWithChebyshevBasis, and v(x)
being either monomial basis or Chebyshev polynomial basis.What do you think?
I think all of these make perfect sense to me. I totally agree on the "AddConstraint" part.
Regarding the various Polynomial
classes, in the python code I'm writing to play myself with these things I organized things similarly, but with small differences. I've a (names are still pretty random) Vector
class, from which Monomial
and Chebyshev
inherit. (I plan to add others classes like these in the future: Lagrange
, Laguerre
, Hermite
etc.) Each of this represent a single element of the polynomial basis, e.g., a single monomial x1^3 x3^2
or, for the Chebyshev, T3(x1) T2(x3)
. Then I represent a Basis
as a list of Vector
s, and a Polynomial
as a Basis
and a list of coefficients of the same size.
I think a set up like this is a little more flexible than having a PolynomialWithXXXBasis
class for each basis. But on the other hand it exposes the user to the full complexity of these things from step one, in the sense that it's a little to abstract? I'm sure you've thought about this better than me: I'm curious to know what's your opinion about this set up? (Happy to chat in person if it makes it easier.)
Regarding the steps of multiplying, differentiating, integrating a Chebyshev polynomial I found them a little tricky to get right. I'd be happy to see how you plan to set these up / give a hand if it's of any help!
Thanks Tobia,
Exactly as you mentioned, I also considered two approaches for implementing the Polynomial
class. The first approach uses class inheritance, the second doesn't.
PolynomialBasis
class (similar to your vector
class), and Monomial
, Chebyshev
, Lagrange
etc all inherit from this PolynomialBasis
class. Then Polynomial
class stores a mapping from PolynomialBasis
to its coefficient. The problem I encounter with this approach is what you mentioned, that I cannot think of a common interface to support some operations on PolynomialBasis
. For example, the differentiation of Chebyshev polynomial is not a linear combination of other Chebyshev polynomials (dTn(x) / dx = n * Un-1(x)
); on the other hand, the differentiation of a monomial is still a monomial. Hence conceptually I find it hard to design a common interface for PolynomialBasis::Differentiate
function. I am curious how you solve this problem.If you express Un-1
in terms of T0, ..., Tn-1
using the formula "with the odd and even cases" from here, don't you get that dTn(x) / dx
is a linear combination of other Chebyshev polynomials?
In my implementation, the derivative of a Vector
is a Polynomial
, expressed in the same basis of the Vector
I'm differentiating. So if I differentiate a Monomial
, call it m
, I get a Polynomial
with coefficients = [1]
and basis = [dm/dx]
. If I differentiate a Chebyshev
I get a more complex Polynomial
, with a bunch of coefficients
but with the basis
that still contains Chebyshev
vectors only.
I did something similar for multiplication and integration of monomials. However, following this path, I ended up having things like the product of two Monomial
s to return a Polynomial
, which is kind of odd? (Since, again, the product of two Chebyshev
s is a linear combination of Chebyshev
s.)
If you express Un-1 in terms of T0, ..., Tn-1 using the formula "with the odd and even cases" from here, don't you get that dTn(x) / dx is a linear combination of other Chebyshev polynomials?
Thanks! I didn't know the equation to convert Un-1
in terms of T0, ..., Tn-1
. This solves the differentiation problem I had before.
So if I differentiate a Monomial, call it m , I get a Polynomial with coefficients = [1] and basis = [dm/dx]
I think my implementation would probably be different. Say m(x) = x^2
, after differentiation I will get a polynomial, with dm(x)/dx = 2x
, that the coefficient is [2]
and the basis is [x]
. Apart from that I agree if we differentiate a basis we get a complicated polynomial with the same basis.
I think the common API for multiplication would be
std::unordered_map<PolynomialBasis, double> PolynomialBasis::multiply(const PolynomialBasis& other);
Namely if we multiply two polynomial basis (monomial multiplies a monomial, or Chebyshev multiplies a Chebyshev), we return a linear combination of the same basis.
Just curious, why you need the integration of monomials?
I think my implementation would probably be different. Say
m(x) = x^2
, after differentiation I will get a polynomial, withdm(x)/dx = 2x
, that the coefficient is[2]
and the basis is[x]
. Apart from that I agree if we differentiate a basis we get a complicated polynomial with the same basis.
Sorry, my mistake. Differentiating a Monomial I get a Polynomial like the one you say (coefficients = [power of the variable wrt I'm differentiating]
).
I agree on the multiplication.
For the integration, I need it for the objective of the SOS programs I'm working on. More specifically, the objective in the approximate DP problems is to "push up as much as possible" the lower bound to the value function. And this is done by maximizing the integral of the approximate value function (which is upper bounded by the true value function) within a region of state space of interest.
@jwnimmer-tri , may I ask you a question about code design? As Tobia and I talked above, I would like to design a base class called symbolic::PolynomialBasis
, with some derived class
symbolic::Monomial
symbolic::Chebyshev
symbolic::Hermite
Then I would like to re-design our symbolic::Polynomial
class (which currently only takes symbolic::Monomial
as its basis). After the redesign, I hope that symbolic::Polynomial
can take other types (like Chebyshev
and Hermite
as basis). Mathematically, we will represent the polynomial p(x)
as a linear combination of its basis b(x)
p(x) = c1 * b1(x) + c2 * b2(x) + ... + cn * bn(x)
where c1, c2, ..., cn
are coefficients (of type symbolic::Expression
). Also I would like to require that the basis b1(x), ..., bn(x)
must have the same type (all bi(x)
has to be monomial, or all has to be Chebyshev, but not a mixture of Monomial and Chebyshev).
I think symbolic::Polynomial
class will store a map from the basis bi(x)
to its coefficient ci
. But I don't think I should use map as std::unordered_map<PolynomialBasis, symboic::Expression>
, as this map allows the keys to be a mixture of different derived class (like a mixture of Monomial and Chebyshev).
The alternative I can think of is to make symbolic::Polynomial
a templated class, with the template parameter as one of the derived basis class.
template <type Basis>
class Polynomial {
std::unordered_map<Basis, symbolic::Expression> basis_to_coeff_map_;
};
and we will use the polynomial class as Polynomial<Monomial>
and Polynomial<Chebyshev>
.
The problem of this alternative approach is that I will break the current API (the current symbolic::Polynomial class is not templated). Or I can create a new templated class GenericPolynomial
, then I won't break the current symbolic::Polynomial API.
Do you think there is a better way?
I think I understand better what's going on between primal and dual bases. I hope the following makes sense and, if it does, I hope they aren't trivial considerations.
I find the presentation from Pablo’s book to be the clearest (Section 3.1.4). As in Theorem 3.41, we can pick the basis v1, ..., vs
for n
variate polynomials up to degree d
, and the basis w1, ..., wt
for the dual space (dual of the n
variate polynomials up to degree 2d
). Once we have made this decision, the constraint to be added to the SOS program is the formula given in Theorem 3.41.
Now, what does the generic wk
look like? By definition, it is a linear functional wk(p)
(or, using the notation of the book, <p, wk>
) that, given the polinomial p
, returns a real-valued scalar. (One random example could be wk(p) = int_0^1 p(x) dx
: linear and real valued.)
Given a primal basis v1, ..., vs
, one can define (citing page 67) a "corresponding dual basis" as the set of functionals for which it holds wk(vi) = δik
, with δik
Dirac delta. Note that, if p
is expressed in some primal basis, wk
from the corresponding dual basis is just defined as wk(p) = kth coefficient of p
. However, in general, we might not want to work with a pair of corresponding bases for the primal and the dual: "there may be advantages (numerical or otherwise) in not doing so."
Say for example we want to use the monomial basis for the primal and the Chebyshev basis for the dual. Then v1, ..., vs
are the classical monomials, whereas wk(p)
takes a polynomial p
and returns the coefficient of the k
th Chebyshev polynomial (i.e. the k
th coefficient if p
was expressed in Chebyshev basis rather than in monomial basis). A way to do this, as done in Jia Li Sun thesis in Section 4.3, is to use the (ill-conditioned) linear transformation (3.24).
If this interpretation is correct, it seems that doing, e.g., primal-monomial-dual-Chebyshev doesn't make much sense: you end up writing exactly the same constraint as in the primal-Chebyshev-dual-Chebyshev case but, along the way, you have an ill-conditioned matrix inversion.
To summarize. I'd say that the constraints you send to the SDP solver would depend, in theory, only on the dual basis you employ. This choice of the dual basis is very important, since it can make equivalent constraints much nicer numerically. However, since we work with finite precision, also the choice of the primal basis turns out to be relevant in practice: an appropriate primal basis can prevent ill-conditioned transformations during the evaluations wk(p)
.
@hongkai-dai : Is this in line with your understanding of the problem?
@TobiaMarcucci , thanks a lot for looking into this. Pablo's book is a lot clearer. Yes I have the same understanding as yours, that primal-monomial-dual-Chebyshev doesn't make sense. It seems that we need to just make two decisions:
p(x)
and v(x)
.I do have a question on using primal-Chebyshev-dual-Lagrange. If the region of interest is not [-1, 1], it is not clear to me whether we need to use shifted Chebyshev polynomial to scale the region of interest to [-1, 1]. Here is an example:
Say we want to prove that x >= 10 implies x³ >= 1. We will need to show that
p(x) = x³ - 1 - r(x) * (x-10) is SOS
r(x) is SOS
Say we want to use Chebyshev polynomial (not the shifted one) as the basis. We have
x³ - 1 = 1/4*T₃(x) + 3/4 * T₁(x) - T₀(x)
x - 10 = T₁(x) - 10*T₀(x)
and we write r(x)
using Chebyshev basis as r(x) = c₀T₀(x) + c₁T₁(x) + c₂T₂(x)
We then have
p(x) = 1/4*T₃(x) + 3/4 * T₁(x) - T₀(x) - (c₀T₀(x) + c₁T₁(x) + c₂T₂(x)) * (T₁(x) - 10*T₀(x)) is SOS
r(x) = c₀T₀(x) + c₁T₁(x) + c₂T₂(x) is SOS
For the first SOS condition p(x) is SOS
, we write it as
p(xᵢ) = [T₀(xᵢ), T₁(xᵢ)]' * Q * [T₀(xᵢ), T₁(xᵢ)], i = 1, 2, 3
Q is psd
where xᵢ are some sampled points.
For the second SOS condition r(x) is SOS
, we write this SOS condition as
r(xᵢ) = [T₀(xᵢ), T₁(xᵢ)]' * S * [T₀(xᵢ), T₁(xᵢ)], i = 1, 2, 3
S is psd
I think I can choose xᵢ as the Chebyshev nodes in the range [-1, 1], xᵢ = cos((2i-1)/(2n)π
, since the Chebyshev polynomial are "nicely behaved" in the range [-1, 1].
It seems to me that by using Chebyshev polynomial and sample points within the range [-1, 1], the conditions above don't have severe numerical problems? So although our range of interest is [10, infinity], we don't have to evaluate the polynomial in the range of interest (namely xᵢ doesn't have to be in [10, infinity])? @TobiaMarcucci did I do something wrong here?
For the dual part, should we use monomial basis (compare the polynomial coefficients) or Lagrange basis (compare the evaluation of polynomial at some sample points).
One comment regarding this: my understanding is a little different I think. By saying that we use the monomial basis for the dual, don't we mean that we compare the coefficients of the monomials? My understanding is that if we use, e.g., Chebyshev both for the primal and the dual, we express p(x)
and v(x)
in Chebyshev basis (so far so good), but on the dual side we do coefficient matching, since the primal and the dual bases coincide?
I do have a question on using primal-Chebyshev-dual-Lagrange.
No, I don't see anything wrong in your reasoning. I think that for the class of problems you consider the conditioning issues can show up in the constraint r(x) is SOS
. Maybe to show that p(x) is SOS
over some region, you might need a multiplier r(x)
that has "small" coefficients if expressed in Chebyshev basis, but has "huge" coefficients if expresses in monomial basis? I should have a better think though...
One comment regarding this: my understanding is a little different I think. By saying that we use the monomial basis for the dual, don't we mean that we compare the coefficients of the monomials? My understanding is that if we use, e.g., Chebyshev both for the primal and the dual, we express p(x) and v(x) in Chebyshev basis (so far so good), but on the dual side we do coefficient matching, since the primal and the dual bases coincide?
@TobiaMarcucci Yes you are right, sorry I wasn't clear when I said it in the first place. If we use monomial basis, then we compare the coefficients of monomials. If we use Chebyshev basis, then we compare the coefficients of Chebyshev polynomials.
As far as I read it, the advantage of using Lagrange polynomial in the dual is the low-rank property of the constraint matrix, namely the in constraint trace(Q Vᵢ) = p(xᵢ)
, the matrix V
is rank 1 as Vᵢ = vᵢ vᵢᵀ
. In the Lofberg's paper he mentions that DSDP can exploit the low rank constraint matrix. If the solver cannot exploit this low rank structure, does that mean using Lagrange dual gives no numerical advantage? I am not sure CSDP or Mosek can exploit the low rank property.
BTW, I just sent an email to Mosek asking them if they exploit the low rank property in the newest version. The answer is no.
Mosek staff replied that it is relatively easy for them to exploit low rank constraint matrices. We might expect this feature in Mosek soon.
Just for reference, I cleaned my implementation of the Polynomial class mentioned above. The code is here. By tomorrow or the day after, I plan to integrate it with MathematicalProgram and start to do some preliminary testing.
I want to revive this issue, but suggest a different design pattern.
OrderedMonomialBasis
which takes a set of variables and a monomial ordering. This class will have a method GetNext
which given a monomial, can get the next monomial in the basis according to the monomial ordering. It will also have the method GetFirstN
which will return the first n
elements in the basis.PolynomialBasis
as a change of basis of an OrderedMonomialBasis
. This class will also have a GetNext
and GetFirstN
method which must return the change of basis matrix and the ordered monomials used to generate the elements of that basis.This would facilitate some more algebraic computations.
This comes from a discussion with @TobiaMarcucci
Currently our
Polynomial
class is written as a linear combination of monomials∏xᵢᵈⁱ
. As mentioned in the previous literature, monomials are "bad" bases for polynomials. Instead, we can consider other orthogonal basis such as Chebyshev, Lagrange, Gegenbauer polynomials (section 3.1.5 of https://sites.math.washington.edu/~thomas/frg/frgbook/SIAMBookFinalvNov12-2012.pdf).To do this, we should provide classes like
ChebyshevPolynomial
, together with the supported operations on these polynomials (addition, subtraction, multiplication, etc).