JuliaReach / LazySets.jl

Scalable symbolic-numeric set computations in Julia

https://juliareach.github.io/LazySets.jl/

Other

226 stars 32 forks source link

linear_map in H-representation (question/feature) #966

Closed tomerarnon closed 5 years ago

tomerarnon commented 5 years ago

Is it possible to do a linear mapping of a polytope without converting to V-representation?

In my application (verification of neural networks) I am dealing with very high dimensional polytopes (750+ dimensions with 1500+ constraints) representing potential inputs to a neural network. vertices_list (necessary for linear_map) routinely fails on these polytopes, likely due to their high dimensionality. I have been using Polyhedra with CDDLib, and often see cases where julia hangs indefinitely on a call to, for example, tovrep, on polytopes starting at around 25 dimensions. I am not sure what is causing this, particularly since 20-25 dimensional polytopes sometimes complete in under a second and other times hang indefinitely (I've tried 20+ minutes before interruption). But that is an issue for another library!

Regardless of why exactly this behavior is occurring, I am wondering whether it is possible to circumvent the conversion entirely and somehow map a polytope's h-representation directly.

mforets commented 5 years ago

Is it possible to do a linear mapping of a polytope without converting to V-representation?

If the linear map is invertible then Ax <= b, y = Cx => AC^(-1)y <= b.
Use quantifier elimination. See the 2nd answer in this question.
Use support functions.

The third approach is the one available "by default" in LazySets, e.g. when you do L = M * X where M is a matrix and X a LazySet, creating a lazy linear map L. One can reason about L by evaluating the support vector (to obtain vertices) or support function (to obtain distances).
This works efficiently in high dims because ρ(d, MX) = ρ(M^T * d, X), so the complexity is that of a single LP.

What makes support functions useful is that one can combine further operations on L using the properties here -- one can also do intersection lazily, usually with an overapproximation. I would suggest to check in your application if you really need all vertices or can do the job more lazily using the support functions machinery.

tomerarnon commented 5 years ago

Thanks for the response @mforets! I will read through each of the options.

In the case of NNs, each layer consists of an affine mapping (including also a translational component), followed by an activation function, in this case represented by the addition of a new constraint. Would it be possible to perform this operation with the support functions? It seems as though:

the translation might require a center/vertices to act on (maybe there is a workaround though)
the resulting lazy linear map would need to be converted into an hrep to add the new constraint.

I am far from an expert in this, so if you have additional thoughts they are much appreciated! In the meantime, I will reconsider the viability of each method.

mforets commented 5 years ago

A translation can be done lazily with an M-sum:

julia> using LazySets, Plots

# translate the set X along direction v (use A \oplus[TAB] B as a shortcut for MinkowskiSum(A, B))
julia> translate(v, X) = Singleton(v) ⊕ X

V = rand(VPolygon)
T = translate([1.0, 1.0], V)

plot(V, alpha=0.5)
plot!(T, 1e-3, alpha=0.5)

In high dims:

julia> import LazySets.Approximations.UnitVector

julia> n = 1000
julia> v = rand(n)
julia> P = convert(HPolyhedron, rand(Hyperrectangle, dim=n))

julia> T = translate(v, P);

# calculate maximum x₁ coordinate = supp func along direction [1, 0, ..., 0] 
julia> for i in 1:3 @time ρ(UnitVector(1, n, 0.5), T) end

  2.207872 seconds (2.61 M allocations: 142.813 MiB, 7.29% gc time)
  0.050949 seconds (11.02 k allocations: 15.651 MiB)
  0.059456 seconds (11.02 k allocations: 15.651 MiB, 24.54% gc time)

tomerarnon commented 5 years ago

Ah, I see, I did not know about ⊕, and it looks like it can be combined with a linear map just as easily. So my first concern is resolved.

Returning to the second point, in cases of exact reachability analysis, where the set can not be progressively approximated as a Hyperrectangle (in which case constraining would be simple), is there also a way to lazily add a constraint to the resulting polytope after transformation? Or to return it to a polytope representation to add the constraints and transform again?

As for your earlier question of:

I would suggest to check in your application if you really need all vertices or can do the job more lazily using the support functions machinery.

Strictly speaking, I do not need the vertices themselves, only to be able to transform a polytope and constrain it (repeatedly). It is widely known that this is an expensive operation, so I was not expecting lightning fast speeds, but I was at least expecting the operation to terminate 😆 At the end, there must be a subset check to determine whether properties of the NN are satisfied or not. For this operation the vertices list will be necessary (due to the definition of issubset on polytopes). I don't imagine there is a lazy workaround to that in the exact case.

When you said

one can also do intersection lazily, usually with an overapproximation

what exactly did you mean? I am imagining that perhaps intersecting with an interval could substitute a constraint?

mforets commented 5 years ago

I have been using Polyhedra with CDDLib, and often see cases where julia hangs indefinitely on a call to, for example, tovrep, on polytopes starting at around 25 dimensions.

Hmm, if it hangs indefinitely, it sounds like worth reporting the example. Have you tried with another library? See this list.

mforets commented 5 years ago

For your second point, let me extend a bit on doing the intersection lazily. IIUC, given an initial polytope P₀, the application of a layer transforms it into the set P₁ = H ∩ M(v ⊕ P₀), the next layer is applied to P₁, and so on for a given number of k layers, obtaining Pk. Finally, it is checked if Pk is included in another set R (what kind of set is R?)

By linearity of ⊕, and supposing WLOG that the layers are equal,

P₁ = H ∩ M(v ⊕ MP₀) = H ∩ (Mv ⊕ MP₀),
P₂ = H ∩ (Mv ⊕ MP₁) = H ∩ (Mv ⊕ M(H ∩ (Mv ⊕ MP₀))
...

AFAICT the terms on the RHS cannot be simplified further without overapproximation.

To summarize some alternatives to proceed further:

Use concrete polyhedra to calculate Pk. The drawback as you've outlined above is that it doesn't scale with the dimension.
Make intermediate overapproximations of the Pi. This is the typical approach used in flowpipe approximation, because one can scale with a reasonable increase in the overapproximation error. The idea is to calculate the approximative sequence:

Q₁ = P₁ = H ∩ (Mv ⊕ MP₀)
Q₂ = H ∩ (Mv ⊕ M * overapproximate(Q₁, options...))
Q₃ = H ∩ (Mv ⊕ M * overapproximate(Q₂, options...))
...

Let me note that overapproximate(Qi, options) can be taken in different ways (not necessarily necessarily an hyperrectangular set). For example one can approximate Qi with a polytype using a set of template directions (see LazySets.Aprpoximations.overapproximate), or use a cartesian decomposition and overapproximate in some dimensions only but accurately (see LazySets.Approximations.decompose). The decomposed approach -developed in this paper -, scales very well with dimensions and is particularly useful when the sets/operations are sparse.

However, one can always work exactly with the nested lazy set, i.e. build the Pi lazily (this is inexpensive), and ask questions about Pk using support functions. The drawback is that the cost of computing support functions for very heavily nested lazy sets can increase fast! (this is the motivation for overapproximating!) Actually, this approach will be effective or not depending on technical factors, such as the dimension n, how many layers k you have, and the implementation of the operations (notably the intersection) which ultimately depend on the set representation used. Below i build the Pi lazily to experiment a bit.

For convenience introduce a Translation lazy set and some basic rules:

using Revise
using LazySets
import LazySets.Approximations.UnitVector

# wrapper type for a translation of the set X along direction v, i.e. v⊕X
struct Translation{N <: Real, VN <: AbstractVector{N}, SN <: LazySet{N}} <: LazySet{N}
    v::VN
    X::SN
    function Translation(v::VN, X::SN) where {N, VN <: AbstractVector{N}, SN <: LazySet{N}}
        @assert length(v) == dim(X)
        return new{N, VN, SN}(v, X)
    end
end

# ambient dimension
function LazySets.dim(T::Translation) 
    length(T.v)
end

# support vector
function LazySets.σ(d::AbstractVector, T::Translation) 
    T.v + σ(d, T.X)
end

# support function
function LazySets.ρ(d::AbstractVector{N}, T::Translation{N}) where {N<:Real}
    ρ(d, Singleton(T.v)) + ρ(d, T.X)
end

# define the rule M * (v⊕X) = (M*v) ⊕ (M*X), returning a new translation
function LazySets.LinearMap(M::AbstractMatrix{N}, T::Translation{N}) where {N <: Real}
    return Translation(M * T.v, M * T.X)
end

Let Layer be the abstraction of one layer of the neural network. One possibility to implement the activation function is to use a lazy intersection with the constraint. This is like adding a constraint lazily to the linear map of the translated polytope.

using Optim # usually needed for intersections

struct Layer{N <: Real, VN <: AbstractVector{N}, MN <: AbstractMatrix{N}}
    M::MN # affine mapping
    v::VN # translational component
    H::HalfSpace{N} # constraint representing the activation function
end

function activate(L::Layer, X::LazySet)
    return Intersection(L.H, L.M * Translation(L.v, X))
end

In another comment i'll put some timings of calculating the support function of a given initial set.

mforets commented 5 years ago

Test code

# return a randomly generated layer in the given dimension (for testing purposes)
function random_layer(n::Int)
    M = rand(n, n)
    v = rand(n)
    H = rand(HalfSpace, dim=n)
    return Layer(M, v, H)
end

function test_k_layers(n, k, P₀)
    # setup: calculate the lazy set Pk
    @time begin
        Pcurrent = P₀
        for i in 1:k
            Li = random_layer(n)
            global Pnext = activate(Li, Pcurrent)
            Pcurrent = Pnext
        end
        Pk = Pnext
    end

    # do something with Pk e.g. calculate the max value
    # along direction [1, 0, ...]
    d = UnitVector(1, n, 1.0)
    @time ρ(d, Pk)
    return nothing
end

Results

The test code executes the following loop, for different number of layers k and dimension n. Since the sets are randomly generated, sometimes it may happen that the introduced linear constraint points outwards to the given set, i.e. the intersection is empty and we can ignore those. So we take some number of repetitions of each experiment for a fixed value of k and n.

for n in [2, 50, 100, 500]
    println("\n n = $n")
    for r = 1:3
        println("\n    repetition $r")
        P₀ = convert(HPolytope, rand(Hyperrectangle, dim=n))
        try
            test_k_layers(n, k, P₀)
        catch
            println("       the intersection is empty\n")
        end
    end
end

k = 1

n = 2

    repetition 1
  0.000010 seconds (10 allocations: 576 bytes)
  0.003326 seconds (7.00 k allocations: 456.969 KiB)

    repetition 2
  0.000006 seconds (10 allocations: 576 bytes)
  0.004316 seconds (7.00 k allocations: 463.547 KiB)

    repetition 3
  0.000009 seconds (10 allocations: 576 bytes)
       the intersection is empty

 n = 50

    repetition 1
  0.000017 seconds (11 allocations: 21.406 KiB)
  0.040100 seconds (41.64 k allocations: 6.184 MiB)

    repetition 2
  0.000013 seconds (11 allocations: 21.406 KiB)
  0.025314 seconds (27.50 k allocations: 4.083 MiB)

    repetition 3
  0.000014 seconds (11 allocations: 21.406 KiB)
  0.030989 seconds (25.48 k allocations: 3.783 MiB, 12.33% gc time)

 n = 100

    repetition 1
  0.000075 seconds (11 allocations: 81.000 KiB)
  0.128813 seconds (78.44 k allocations: 20.013 MiB)

    repetition 2
  0.000050 seconds (11 allocations: 81.000 KiB)
  0.130476 seconds (78.44 k allocations: 20.013 MiB, 5.26% gc time)

    repetition 3
  0.000069 seconds (11 allocations: 81.000 KiB)
  0.134488 seconds (78.44 k allocations: 20.013 MiB, 3.34% gc time)

 n = 500

    repetition 1
  0.001395 seconds (11 allocations: 1.919 MiB)
  1.295913 seconds (281.27 k allocations: 248.616 MiB, 2.31% gc time)

    repetition 2
  0.000741 seconds (11 allocations: 1.919 MiB)
  1.298930 seconds (285.81 k allocations: 252.626 MiB, 2.35% gc time)

    repetition 3
  0.000719 seconds (11 allocations: 1.919 MiB)
  1.445615 seconds (326.63 k allocations: 288.716 MiB, 2.18% gc time)

k = 2

n = 2

    repetition 1
  0.000012 seconds (20 allocations: 1.125 KiB)
       the intersection is empty

    repetition 2
  0.000005 seconds (20 allocations: 1.125 KiB)
       the intersection is empty

    repetition 3
  0.000005 seconds (20 allocations: 1.125 KiB)
  0.196425 seconds (414.76 k allocations: 26.787 MiB, 2.45% gc time)

 n = 50

    repetition 1
  0.000022 seconds (22 allocations: 42.813 KiB)
  2.276098 seconds (1.96 M allocations: 291.845 MiB, 2.37% gc time)

    repetition 2
  0.000029 seconds (22 allocations: 42.813 KiB)
  4.821582 seconds (4.27 M allocations: 636.409 MiB, 1.33% gc time)

    repetition 3
  0.000023 seconds (22 allocations: 42.813 KiB)
  2.424744 seconds (2.14 M allocations: 319.396 MiB, 1.63% gc time)

 n = 100

    repetition 1
  0.000111 seconds (22 allocations: 162.000 KiB)
 13.783388 seconds (7.96 M allocations: 1.989 GiB, 2.39% gc time)

    repetition 2
  0.000113 seconds (22 allocations: 162.000 KiB)
 13.411636 seconds (7.97 M allocations: 1.991 GiB, 2.40% gc time)

    repetition 3
  0.000094 seconds (22 allocations: 162.000 KiB)
  5.622417 seconds (3.67 M allocations: 937.869 MiB, 2.41% gc time)

 n = 500

    repetition 1
  0.002350 seconds (22 allocations: 3.839 MiB)
211.746992 seconds (47.80 M allocations: 41.283 GiB, 2.34% gc time)

    repetition 2
  0.001325 seconds (22 allocations: 3.839 MiB)
205.161758 seconds (47.66 M allocations: 41.162 GiB, 2.36% gc time)

    repetition 3
  0.001032 seconds (22 allocations: 3.839 MiB)
217.512224 seconds (47.81 M allocations: 41.291 GiB, 3.32% gc time)

Conclusion

The first reported time lapse is the initialization step (building the lazy set) and the second step is actually calculating the support function of Pk. The lazy set is built fast, but the support function of nested intersections becomes harder in dimensions > 100, say. These observations are to be expected.

The current implementation of halfspace vs. compact set intersection (see this function) is not too bad, since it already uses the reduction to one univariate optimization problem that is easily solved with a line search. However, i think that there is room for improvement (there was an attempt in https://github.com/JuliaReach/LazySets.jl/pull/793 and i'd be happy to follow it up or think on alternatives!). Essentially one would like the line search to converge faster to the optimum, such that the number of iterations are reduced. Note that number of calls to the supp func for a nested intersection increases like k!, so a reduction in the convergence can have a high impact.

mforets commented 5 years ago

At the end, there must be a subset check to determine whether properties of the NN are satisfied or not. For this operation the vertices list will be necessary (due to the definition of issubset on polytopes).

It depends on the property to be checked. It is easy if the property is a linear constraint, for example. There are other special cases available in is_subset.jl, and maybe others can be added. For example, containment check of a compact set X in a hyperrectangular set can be done with the interval hull approximation of X, and there is no need to calculate the vertices of X.

tomerarnon commented 5 years ago

@mforets I just wanted to respond to make it clear that this is not falling on deaf ears. I really appreciate the thought (and work) you've put into this. I simply need to digest it a bit longer, and rethink a few things as well. And also maybe learn a bit more about support functions... I'll respond in full shortly.

schillic commented 5 years ago

my application (verification of neural networks)

The world is small. I am also working in such a project (a bit).

Cross-referencing: #769, #882

@mforets: Do you want to send your code for #882? Should we also add an AffineMap type for convenience?

given an initial polytope P₀, the application of a layer transforms it into the set P₁ = H ∩ M(v ⊕ P₀)

I would rather understand an affine map as MP₀ ⊕ v.

is there also a way to lazily add a constraint to the resulting polytope after transformation?

Yes, using a lazy Intersection with the constraint (see the code sample by @mforets above). But in general this is not very efficient if deeply nested (as we saw in the evaluation). You can also flatten the set representation again by concretizing (using overapproximate, e.g., with template directions). (As the name suggests, this generally gives an overapproximation.) Then you can use the concrete intersection (currently only if the other set is an HPolytope, but more methods can be added).

Let me also make clear again that our lazy Intersection operation is conservative, which means you do not get a decision algorithm (which you might expect from (in principle computable) polytopes). (Still, in your application "conservative" is appropriate for verification, just not for falsification).

subset check [...] For this operation the vertices list will be necessary (due to the definition of issubset on polytopes).

At least if you can live with a conservative subset check, you do not need the vertices (as @mforets already mentioned). For inclusion in a polyhedron, I just created #972. Just let us know which set type you need and we can think about it.

mforets commented 5 years ago

Let me also make clear again that our lazy Intersection operation is conservative, ...

Or the support function of Intersection is conservative in general (see this function), although for halfspace vs. lazyset, ρ(d, Pi) uses _line_search, which on convergence finds the global minimum of f(λ) = ρ(ℓ - λa) + λb hence it does not overapproximate.

schillic commented 5 years ago

980 brought `linear_map(M, P)` for invertible `M` and polyhedron/polytope `P`.

(For polytope P and singular M the method falls back to vertex representation.)

mforets commented 5 years ago

@tomerarnon it may be worth giving a try to linear_map(M, P) of the new release (for the old behavior use cond_tol=0); a further improvement in this direction is proposed in https://github.com/JuliaReach/LazySets.jl/issues/1034.

tomerarnon commented 5 years ago

@mforets, wow! Will do! It's hard to believe it's been almost 3 weeks since I had to set this aside. I'll try to get back in to it as soon as I can, including checking out the new release. Thanks for the heads up!

@schillic:

The world is small. I am also working in such a project (a bit).

Indeed it is! Here is the project we are working on, in case you want to check it out. Recently made public, not yet made functional 😜. If you do take a look you may note that we're not making adequate use of the "lazy" properties of this packages. This is very true, but don't be alarmed, more of that will come!

Then you can use the concrete intersection (currently only if the other set is an HPolytope, but more methods can be added).

I didn't know about this! Will definitely use it, since we deal primarily with HPolytopes.

At least if you can live with a conservative subset check, you do not need the vertices (as @mforets already mentioned). For inclusion in a polyhedron, I just created #972. Just let us know which set type you need and we can think about it.

I can most definitely live with conservative checks. We are working almost entirely with HPolytopes, HalfSpaces, and Hyperrectangles. We resort to using VPolytopes where necessary. So it sounds like you already have all of the correct machinery.

mforets commented 5 years ago

Is it possible to do a linear mapping of a polytope without converting to V-representation?

A fourth idea that can be added to my second comment in this thread is to change the default behavior of linear_map such that it converts the given polytope to a zonotope and the linear map is applied to it (which is fast). We are actually considering doing the change these days.

The drawback is that is not stable with respect to the input representation (the returned set is a zonotope), but conversion back to HPolytope is available if needed.

Note that this change, however, improves the case of rectangular linear maps (see https://github.com/JuliaReach/LazySets.jl/issues/1169) since one does no longer need to pass to v-rep.

schillic commented 5 years ago

converts the given polytope to a zonotope

In general this conversion is not exact. So I would say that this is rather a candidate for overapproximate(::LinearMap{<:AbstractPolytope}, ::Type{Zonotope}).

mforets commented 5 years ago

That's correct -- i was implicitly assuming that the input set is hyperrectangular as it was the test case in this thread; with that type the exact conversion is fast because the generators matrix is diagonal. But for a general polytope we should use overapproximate.

schillic commented 5 years ago

Meanwhile we have added some options;

Use left division.
Use the inverse if the matrix is invertible.
Hyperrectangles are automatically handled as zonotopes. If that is not desired, the method for general polyhedra can be invoked as usual.

Due to lack of further ideas, I close this one.

JuliaReach / LazySets.jl

linear_map in H-representation (question/feature) #966

Test code

Results

k = 1

k = 2

Conclusion

980 brought linear_map(M, P) for invertible M and polyhedron/polytope P.

980 brought `linear_map(M, P)` for invertible `M` and polyhedron/polytope `P`.