cheind / py-lapsolver

Fast linear assignment problem (LAP) solvers for Python based on c-extensions
MIT License
154 stars 24 forks source link

Found run times vary drastically for different distance matrices with same sizes. #9

Open guoyangqin opened 4 years ago

guoyangqin commented 4 years ago

As the author put it, this algorithm is much faster. For a randomly generated 3000x3000 cost matrix, it'll take a fraction of seconds to solve.

N_a = 3000
N_b = 3000

# Random costs
dist_mat_rand = np.random.uniform(0, 400, (N_a, N_b))

start_time = time.time()
ind_1, ind_2 = solve_dense(dist_mat_rand)
print(time.time() - start_time)

0.8244 sec

However, when it comes to a distance matrix between two sets of points, the run time increases drastically, the same sizes though.

# Manhattan distance bipartite matching
loc_a = np.random.uniform(0, 100, (N_a, 2))
loc_b = np.random.uniform(0, 200, (N_b, 2))
dist_mat = get_dist_mat(loc_a, loc_b)

start_time = time.time()
ind_1, ind_2 = solve_dense(dist_mat)
print(time.time() - start_time)

30.2019 sec

If adding the above two matrix, the run time will be between the two above.

start_time = time.time()
ind_1, ind_2 = solve_dense(dist_mat+dist_mat_rand)
print(time.time() - start_time)

17.5623 sec

The only difference I can think of the two distance matrices is whether they have correlations. Was wondering why the run times are so hugely different?

cheind commented 2 years ago

@guoyangqin sorry I missed your question for more than 2 years :(. Yes, random cost matrices are probably not representative for a lot of real-world problems like you mention. More info in #1