cheind / py-lapsolver

Fast linear assignment problem (LAP) solvers for Python based on c-extensions
MIT License
154 stars 24 forks source link

Extend Benchmarks #1

Open cheind opened 6 years ago

cheind commented 6 years ago

More libraries to test against: pymatgen, centrosome See https://github.com/gatagat/lap/issues/9

sschnug commented 6 years ago

In terms of benchmarks, i don't know what i should think about random-instances. For many problems this is completely irrelevant (and might bumb low-level performance in contrast to algorithmics). I would focus on finding real-world problems to test against!

Additionally: try to beat

I would not be surprised to see that those are faster (despite theoretically slower algorithms).

cheind commented 6 years ago

Yes you might be right that random instances are rather made up, but at least the dense case is very relevant to me. In motmetrics we are globally matching detections made by a predictor to ground truth detections. Usually humans are visible in 80-90 percent of the frames, so one could nearly match all predictions made be the tracker in test. This leads to a nearly dense matrix (albeit with some cost structure...). I'll persist such a matrix and let you know.

Thanks for the links, will give them a try!

kylemcdonald commented 5 years ago

Random cost matrices are definitely not representative for many applications. But I wanted to toss in another toy problem that generates a more difficult cost matrix:

https://gist.github.com/kylemcdonald/ac9e2e84552e60543071c33a250111c9

import numpy as np
import lap
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt

side = 32
n = side * side

lhs = np.random.uniform(low=-1, high=+1, size=(n, 2))
lhs = np.cumsum(lhs, axis=0)
lhs -= lhs.min(axis=0)
lhs /= lhs.max(axis=0)

xv, yv = np.meshgrid(np.linspace(0, 1, side), np.linspace(0, 1, side))
rhs = np.dstack((xv, yv)).reshape(-1, 2)

cost = cdist(rhs, lhs, 'sqeuclidean')
%time min_cost, row_assigns, col_assigns = lap.lapjv(cost)
rhs_lhs = rhs[col_assigns]
plt.figure(figsize=(8, 8), facecolor='white')
plt.scatter(lhs[:,0], lhs[:,1])
for start, end in zip(lhs, rhs_lhs):
    dx, dy = end - start
    plt.arrow(start[0], start[1], dx, dy,
              head_length=0.01, head_width=0.01, lw=0.5, alpha=0.5)
plt.xlim(-0.1, 1.1)
plt.ylim(-0.1, 1.1)
plt.show()

download

I use this technique a lot for making grids from point clouds. This kind of euclidean bipartite matching is similar to motmetrics in theory, but in practice I would expect there are way more long-distance matches in this toy problem. In my experience, random cost matrices can be better served by sparse solvers when n is very large (e.g. the 5000x5000 case).

cheind commented 5 years ago

Thanks for the suggestion @kylemcdonald. This looks like a promising benchmark to add. Hopefully I find some time in the upcoming holidays to add.

To understand this fully let me summarize what you are doing: you generate two sets of points. One is regularily spaced the other is more or less randomly positioned. The cost is the squared euclidean distances between instances of points of the two sets. The cumsum operation is used to cluster points a bit more than uniform. The min and max operation to ensure [0,1] range. The image shows the assignment of regular points to closest possible grid points.

kylemcdonald commented 5 years ago

@cheind that's exactly it. I use it for this sort of thing. A friend has a heuristic for this that provides poor solutions for non-uniform data.

For comparison, on my machine, solving a 10,000x10,000 LAP (i.e., matching a cloud of 10,000 points to a grid of 100x100 points):

Solving a 4096x4096 LAP (matching a cloud of 4096 points to a grid of 64x64 points):

If you match two point sets that are more uniform you get closer to a random cost matrix:

lhs = np.random.uniform(low=-1, high=+1, size=(n, 2))
rhs = np.random.uniform(low=-1, high=+1, size=(n, 2))

For 4096 points I see:

I should also note, I can only get this performance out of pylapsolver and lapjv if I convert the cost to uint32:

from lap import lapjv
from lapsolver import solve_dense
from scipy.spatial.distance import cdist
dist = cdist(lhs, rhs, 'euclidean')
dtype = np.uint32
dtype_max = np.iinfo(dtype).max
dist_max = dist.max()
dist = (dist * dtype_max) / dist_max
dist = dist.astype(dtype)
%time cost_jv, rows_jv, cols_jv = lapjv(dist)
%time cols_ls, rows_ls = solve_dense(dist)
cost_jv = (cost_jv * dist_max) / dtype_max
cheind commented 5 years ago

@kylemcdonald thanks for the feedback. I'm a bit disappointed by the performance of lapsolver in this respect. Did you run lapsolver also with int32 version, I does have some type overloading internally, but I suspect that it helps anyways.

with or-tools I always had issues as creating the arcs took overwhelming, especially when having dense problems. I'm suprised by its performance, I have to admit.

I wanted to convert the backend of lapsolver to use Eigen C++ template library for SIMD support and co, but I did not yet find the time to do so.

kylemcdonald commented 5 years ago

I just changed to uint32 for pylapsolver and I saw a huge speed increase for the random cost matrix. For the cloud-to-grid problem I saw it drop from 40s to 33s.

I am familiar with the or-tools issue, seeing your comment on that repo is how I found your work here :)

I've heard that Cython can generate code that will be optimized to SIMD instructions by the compiler, but haven't confirmed this myself yet. If it's true, one other approach could be to try rewriting lapsolver in Cython.

If you're interested in lapcsa here is my initial work https://gist.github.com/kylemcdonald/b92377c98e427eabf3726a1e121b05d8

The original code is designed to only use double for cost, but my next step will be to swap it out for uint32 because I'm hitting memory limits rather than CPU limits (40,000x40,000 in 3m26s, but using 96GB of memory...).

cheind commented 2 years ago

@kylemcdonald hope you are doing well!

I recently stumbled upon IsoMatch, an algorithm to arrange a collection of objects to a target layout while trying to preserve distances between objects in the collection. The algorithm reminded me very much of what suggested. If I am not mistaken, step 3 in their paper matches the grid approach you mentioned above.

Only now (after skimming the paper) I see why computing such an assigment is useful :)