Computation of Full Tensorial Bernstein Form (general discussion)

JuliaReach / BernsteinExpansions.jl

Computing Bernstein coefficients of multivariate polynomials in Julia

https://juliareach.github.io/BernsteinExpansions.jl/

Other

10 stars 2 forks source link

Computation of Full Tensorial Bernstein Form (general discussion) #3

Open roccaa opened 6 years ago

roccaa commented 6 years ago

Idea

Let M be a multivariate monomial. Given four variables x_1,x_2,x_3,x_4 one example would be M = (x_1^2)*(x_2^2)*(x_4). In this example, M is constituted by the product of 4 univariate monomials M_i:

M_1 = (x_1^2), M_2 = (x_2^2), M_3 = (x_3^0) = 1, M_4 = (x_4)

Assuming B_i is the Bernstein expansion of M_i, one can obtain the Bernstein expansion B of M by computing the Kronecker products (that we note here (x)) of its constituting univariate monomial Bernstein expansions B_i.

B = ((B_1(x)B_2) (x)B_3) (x)B_4

Please note the ordering of the operations is important, so commutativity does not hold.

Current implementation

function multivariate_tensor(k::Vector{Int64}, l::Vector{Int64},
                             low::Vector{T}, high::Vector{T})::Vector{T} where {T<:Number}

    n = length(low)
    berncoeffs  = univariate(k[1], l[1], low[1], high[1])

    @inbounds for i in 2:n
        berncoeffs = kron(berncoeffs, univariate(k[i], l[i], low[i], high[i]))
    end

    return berncoeffs
end

Points to improve

The berncoeffs array is always re-allocated, copied etc...
Improve the function kron using GPU implementation ? (ex: @cuda macro) Only when dealing with float32 ?
An AbstractType implementation? Symbolic Bernstein coefficients can be very useful (see Sapo toolbox)
We lose commutativity but associativity is maintained. The multiplications can be segmented on multiple workers to allow parallel computation. For example:

... Worker 1: B12 = (B_1(x)B_2) ... Worker 2: B34 = (B_3(x)B_4) ... Syncro ... Worker 1 B = (B12 (x) B34)

Dictionaries can be used to keep the Bernstein expansion of often used multivariate polynomials. Key = [M_degrees, expansion_degree] (in the example: M_degrees = 2,2,0,1) Data = B
Bernstein expansions of multivariate monomials when the expansion_degree = M_degrees and on the [0,1]^n box are sparse. Currently, we do not take advantage of this.

mforets commented 6 years ago

We saw that a "loop-based" approach such that:

(1) uses Julia's column-major ordering (see this performance tip), i.e. you want the inner loops to run through the left-most indices, (2) Avoids multiplications duplicates using auxiliary variables,

looks like the following, in the particular case of dimension 8:

function fauxc(a, b, c, d, e, f, g, h)

    A = Array{Float64}(length(a), length(b), length(c), length(d), length(e), length(f), length(g), length(h))

    @inbounds begin
    for i8 in eachindex(h)
        for i7 in eachindex(g)
            aux_i7_i8 = g[i7] * h[i8]
            for i6 in eachindex(f)
                aux_i6_i7_i8 = f[i6] * aux_i7_i8
                for i5 in eachindex(e)
                    aux_i5_i6_i7_i8 = e[i5] * aux_i6_i7_i8
                    for i4 in eachindex(d)
                        aux_i4_i5_i6_i7_i8 = d[i4] * aux_i5_i6_i7_i8
                        for i3 in eachindex(c)
                            aux_i3_i4_i5_i6_i7_i8 = c[i3] * aux_i4_i5_i6_i7_i8 
                            for i2 in eachindex(b)
                                aux_i2_i3_i4_i5_i6_i7_i8 = b[i2] * aux_i3_i4_i5_i6_i7_i8
                                for i1 in eachindex(a)
                                    A[i1, i2, i3, i4, i5, i6, i7, i8] = a[i1] * aux_i2_i3_i4_i5_i6_i7_i8
                                end
                            end
                        end
                    end
                end
            end
        end
    end
    end
    return A
end

Notes:

(1) if we go for this approach, it should be possible to generalize with the Base.Cartesian technology (i never used it), see the docs, (2) the output is a mult-dim matrix, but the code with kron returns a (very large) vector, and converting between these takes time esp for high dims.

schillic commented 6 years ago

Here is a general version that should do the same but work for an arbitrary number of vectors. However, it is a lot slower due to heavy allocations (which I do not fully understand). EDIT: If you allocate A outside and pass it to this function, the majority of allocations is gone, but there are still a lot of them.

The sorting is not implemented, but I expect it to boost performance additionally for vectors of different length.

function kron_nested(vectors::Vector{Vector{Float64}}, A=Array{Float64}(ntuple(i -> length(vectors[i]), length(vectors))))
    n = length(vectors)
    @assert n > 1 "need at least two vectors"

    # sort by length, in descending order (i.e., smallest last)
    # vectors = sort(TODO)

    # precompute first multiplications
    cache = Vector{Float64}(n-1)
    cache[n-1] = vectors[n][1]
    i = n-1
    while i > 1
        cache[i-1] = vectors[i][1] * cache[i]
        i -= 1
    end

    # allocate result (commented and moved to input argument)
    # A = Array{Float64}(ntuple(i -> length(vectors[i]), n))

    # fill result
    indices = ones(Int, n)
    v = vectors[1] # leftmost vector
    length_v = length(v) # length of the leftmost vector
    lengths = [length(v) for v in vectors]
    while true
        # loop where result is written (only one multiplication missing)
        indices[1] = 1
        while indices[1] <= length_v
            A[indices...] = v[indices[1]] * cache[1]
            indices[1] += 1
        end

        # reset indices that reached their bounds
        i = 2
        while i <= n && indices[i] == lengths[i]
            indices[i] = 1
            i += 1
        end

        # termination criterion: last index reached its bounds
        if i == n+1
            break
        end

        # increment next index
        indices[i] += 1

        # update cache for all indices that have changed
        if i == n
            # special case: next index is the last index
            cache[i-1] = vectors[i][indices[i]]
        else
            # normal case
            cache[i-1] = vectors[i][indices[i]] * cache[i]
        end
        i -= 1
        while i > 1
            cache[i-1] = vectors[i][1] * cache[i]
            i -= 1
        end
    end

    return A
end

mforets commented 6 years ago

The generalization of fauxc from above can be done with Base.Cartesian. I failed to get it right, but @chriselrod gave the correct answer and a full explanation in discourse.

I copy the approach here that uses the dimensions of the preallocated output array A to know how many loops should be generated:

@generated function fauxc(A::AbstractArray{T,n}, v::VectorOfArray) where {T,n}
    quote
    cache = Vector{Float64}($n-2)
    @inbounds for $(Symbol(:i_, n)) in eachindex(v[$n])
                        @nloops $(n-1) i i -> eachindex(v[i]) d -> d == 1 ? nothing : d == $n-1 ? cache[d-1] = v[i_{d}, d] * v[i_{d+1}, d+1] : cache[d-1] = cache[d] * v[i_{d}, d] begin (@nref $n A i) = v[i_1, 1] * cache[1] end
                    end
    end
end

The arguments to @nloops are N itersym rangeexpr preexpr bodyexpr; the anonymous function d -> d == 1 ... is the pre-expression that defines the cache, and the last part,

begin (@nref $n A i) = v[i_1, 1] * cache[1] end

is just the content of the inner-most loop.

roccaa commented 6 years ago

Great ! (You did my job again Marcelo :P) To update this issue:

kron() computation is dropped until further notice. The function is kept as a reference for performance bottom line, and to verify the correctness of future method results (ensure that a method pass the correctness checks before merging into master branch).
I want to compare with a single loop implementation (using a conversion from single index to multi-index): I think this is the most naive implementation.

Next steps:

Look how to define workers for parallel computing in Julia. How can we parallelize the current _loop_tensor!() function while keeping the allocations as small as possible ?
Current implementation only handle multivariate monomials, how does the current implementation fare when dealing with polynomials (sum of multivariate monomials time a scalar coefficient) ?
I am curious on implementing the matrix methods from Sapo and the recent paper from J.Garloff and J.Titi (using the Pascal triangle, I will send him a mail to know if he have some implementation abd the paper of the method using FFTs).

I will open an issue for each of these sub-problems. Let keep this issue for the general discussions.

mforets commented 6 years ago

Thanks for the clear write-up.

While reading Smith's thesis, i noticed he pays special attention to the computation of rigorous error bounds using interval arithmetic. While we don't support this yet, it shouldn't be hard to setup here, extending our interface and using Interval and IntervalBox from the IntervalArithmetic package.

I also liked his approach for an API dealing with either the Bernstein tensor form and the implicit Bernstein form (see Appendix B towards the end of the thesis), they are fields of some bigger data type that contains the polynomial and the (box) domain. I think we should follow this idea, using a type parametric in a concrete subtype of an AbstractBernsteinForm.

I plan to send a PR in these directions that set the ground for your point (2) in "Next Steps" above. Points (1) and (3) are quite independent and can be done separately.

roccaa commented 6 years ago

@mforets Yes. Implicit Bernstein form is important when one wants to compute a given subset of the Bernstein coefficients (for example when bounding a polynomial). The computation of the full set of Bernstein coefficients is necessary when one wants to provide a precise convex enclosure of the polynomial function.

To summarize:

Implicit form: efficient to compute subset of the Bernstein coefficients, in particular in the context of bounding a polynomial (finding its minimum and maximum over a set). This is not implemented and its require a careful implementation (see A.P. Smith thesis and the Fast Bound paper)
Full Bernstein expansion: Exponential cost in dimension, but necessary when one wants to compute a convex enclosure of a polynomial (as a polytope in vertex form). For example, in the approximation of non-linear differential systems by a convex differential inclusion, or to perform a non-linear lift, for 3D modelling, etc ... This is what we are currently implementing. Note that this later can also be used for bounding a polynomial, even though it is less efficient.

In my previous work, I only used the Bernstein expansion for the min/max application. However, in a library point of view, I think it is important to give both possibilities as one may need the full Bernstein expansion.

About, the interval arithmetic: Bernstein coefficients of monomials are rational. So rational arithmetic is enough currently. Interval arithmetic is important to allow us to handle cases where the coefficients of the polynomial are not rational (ex: sqrt(2), pi, etc...). Example: a*x^2*y^2 + b*x^3*y If a and bare rational then rational arithmetic is enough. If a or b are not rational (for example irrational or parameters in a set) then we can still compute the Bernstein expansions of x^2*y^2 and x^3*y independently using rational arithmetic, and then output the one of a*x^2*y^2 + b*x^3*y symbolically (as done in Sapo using Ginac ) or approximate its solution using intervals as you propose. ( I do not see the point of complex numbers currently. So I consider Bernstein expansion of polynomials over the reals.)