Unify boxes in result by default?

dpsanders commented 4 years ago

There's an implementation using LightGraphs in https://github.com/JuliaIntervals/IntervalOptimisation.jl/pull/5

It doesn't seem so easy to write an implementation by hand. It's basically a union-find algorithm

dpsanders commented 4 years ago

There should be a unify=true keyword argument (true by default)

Suavesito-Olimpiada commented 3 years ago

Hi! I decided to take a chance into this and found there is something weird with the implementation described above for 2 dimensions and above. The weird case is easily described with the next example

       ┌───────────┐
       │           │
       │           │
       │      B₁   │
┌──────┼───┐       │
│      │   │       │
│      └───┼───────┘
│          │
│    B₂    │ ┌────┐
│          │ │ B₃ │
│          │ └────┘
└──────────┘

with the above algorithm the boxes are reduced to this

┌──────────────────┐
│                  │
│                  │
│                  │
│                  │
│                  │
│       B₁∪B₂      │
│                  │
│           ┌────┐ │
│           │ B₃ │ │
│           └────┘ │
└──────────────────┘

But I think thay should be reduced to just B₁∪B₂∪B₃.

In #64 I add the unify keyword with a small but fast implementation of union-find (weighted and with path compression, reference on the algorithm here).

The implementation is quite fast, two examples are the next ones (first from the tutorial)

julia> using IntervalArithmetic, IntervalOptimisation, BenchmarkTools

julia> f(x) = x^2 - 2x + 1
f (generic function with 1 method)

julia> @time minval, minimisers = minimise(f, 0..2; tol=1e-6, unify=false);
  3.626429 seconds (15.91 M allocations: 1.621 GiB, 8.09% gc time)

julia> @show length(minimisers)
length(minimisers) = 2990
2990

julia> @btime unified_minimisers = IntervalOptimisation.unify_boxes(minimisers)
  29.142 ms (75 allocations: 117.45 KiB)
14-element Vector{Interval{Float64}}:
 [0.998687, 1.00136]
 [0.998642, 0.998687]
 [0.998635, 0.998642]
 [0.998623, 0.998627]
 [1.00136, 1.00137]
 [1.00135, 1.00137]
 [1.00138, 1.00139]
 [1.00137, 1.00138]
 [1.00135, 1.00136]
 [0.998632, 0.998634]
 [0.998621, 0.998623]
 [1.00137, 1.00138]
 [1.00137, 1.00138]
 [0.998617, 0.998619]

the other mine

julia> g(x) = abs(x[1]-x[2]) + (x[1]+x[2])^2
g (generic function with 1 method)

julia> @time minval, minimisers = minimise(g, (-1+1e-10..1)×(-1..1); tol=1e-10, unify=false); # initialize with these give a special bad behaviour
  1.815546 seconds (8.21 M allocations: 747.165 MiB, 18.85% gc time)

julia> @show length(minimisers)
length(minimisers) = 27670
27670

julia> @btime unified_minimisers = IntervalOptimisation.unify_boxes(minimisers)
  2.666 s (28 allocations: 946.11 KiB)
1-element Vector{IntervalBox{2, Float64}}:
 [-3.80622e-07, 3.80622e-07] × [-3.80672e-07, 3.80625e-07]

One thing is that if the implementation is changed so that the example showed at the begginig returns B₁∪B₂∪B₃, unify_boxes can be written to be faster (now is cuadratic on the number of minimisers, but it can be O(n log(n))) and avoid the union-find algorithm altogether. (edit: typos and wording)

I'm open to thoughts and ideas. Hope is useful.

P.D. Kudos to asciiflow for the ascii drawing. :stuck_out_tongue:

dpsanders commented 3 years ago

I don't understand why the algorithm doesn't reduce your example to B₁ ∪ B₂ ∪ B₃.

Suavesito-Olimpiada commented 3 years ago

Is because the B₁∩B₃ and B₂∩B₃ is empty, even if (B₁∪B₂)∩B₃ is not, and the algorithm is just testing pairwise the intersections.

dpsanders commented 3 years ago

Nice catch! Does your version give the correct result?

dpsanders commented 3 years ago

A solution seems to be to modify the list of intervals as you go by taking the union (hull) when two intervals are combined, and checking the intersection with this modified list instead of the original list.

dpsanders commented 3 years ago

Hmmm. But actually in your original example we would prefer not to unify (B1 ∪ B2) with B3, since they are actually disjoint. Maybe taking the hull is actually not a good idea.

dpsanders commented 3 years ago

We can just return separate lists for each connected component and the user can take the hull if they prefer; or we can give a convenience function to do so.

Suavesito-Olimpiada commented 3 years ago

Maybe we can add three options for unify, :hull, :connected and :none?

Suavesito-Olimpiada commented 3 years ago

I can add the algorithm for the hull too. I'll add it to the PR and make the changes on the documentation and add tests.

dpsanders commented 3 years ago

@Suavesito-Olimpiada: Sorry for the delay in replying. I think it would be great to have these different options. Is there any progress on this?

dpsanders commented 3 years ago

Note that there is also an implementation of disjoint sets in DataStructures.jl. I'm not sure how it compares to yours.

Suavesito-Olimpiada commented 3 years ago

I can check the differences between my inplementation and theirs, also see performan differences and reply back later.

Suavesito-Olimpiada commented 3 years ago

On the hull algorithm, I still don't have progress. But once I finish my semester I'll do it. :)

dpsanders commented 3 years ago

I think there should be a way to use interval trees for this. There is an implementation in Julia.

dpsanders commented 3 years ago

Or rather R-trees

Suavesito-Olimpiada commented 3 years ago

I'll check out both R-trees and Interval Trees, so I can understand better both structures. But as I see (just over-looking both articles) I think a variant of R-trees is closer to the hull idea (that is, so we can get the maximum number of non-intersecting hulls of intersecting intervals).

dpsanders commented 3 years ago

I think the idea of R-trees is to make it more efficient to check which other boxes a given box intersects.

Suavesito-Olimpiada commented 3 years ago

From the R-Tree article

The key idea of the data structure is to group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree

and from the Interval Tree article

...it allows one to efficiently find all intervals that overlap with any given interval or point

dpsanders commented 3 years ago

And then

The key idea is to use the bounding boxes to decide whether or not to search inside a subtree. In this way, most of the nodes in the tree are never read during a search

The bounding boxes are used to narrow down the set of existing boxes that a new box must be checked against, as far as I understand.

dpsanders commented 3 years ago

See Samet, Foundations of Multidimensional and Metric Data Structures, chap. 3

Suavesito-Olimpiada commented 3 years ago

Ok, ok I think both structures are used to optimise the search of intervals that overlaps other intervals, but I need to check the differences. Thanks for the information and all the references, I'll check them out. :smiley:

JuliaIntervals / IntervalOptimisation.jl

Unify boxes in result by default? #48