Parse large instances with 15K or 30K nodes

I performed a a quick benchmark with the current distance computation + four alternatives:

Script

```python import numpy as np import timeit from tabulate import tabulate import numba def current(coords: np.ndarray) -> np.ndarray: diff = coords[:, np.newaxis, :] - coords square_diff = diff**2 square_dist = np.sum(square_diff, axis=-1) return np.sqrt(square_dist) def faster_current(pts: np.ndarray) -> np.ndarray: sq_sum = np.sum(pts**2, axis=1) sq_diff = np.add.outer(sq_sum, sq_sum) - 2 * np.dot(pts, pts.T) np.fill_diagonal(sq_diff, 0) # avoids minor numerical issues return np.sqrt(sq_diff) def numpy_linalg(X: np.ndarray) -> np.ndarray: return np.linalg.norm(X[:, np.newaxis] - X, axis=2) @numba.jit(nopython=True, fastmath=True) def variant_numba(a: np.ndarray) -> np.ndarray: n = a.shape[0] out = np.zeros((n, n)) for i in range(n): for j in range(i): out[i][j] = euclidean(a[i], a[j]) return out + out.T @numba.jit(nopython=True, fastmath=True) def euclidean(u, v): n = len(u) dist = 0 for i in range(n): dist += abs(u[i] - v[i]) ** 2 return dist ** (1 / 2) def variant_scipy(pts: np.ndarray) -> np.ndarray: from scipy.spatial import distance_matrix return distance_matrix(pts, pts) if __name__ == "__main__": np.random.seed(1) headers = ["# of coords", "Function", "Time (s)"] NUMBER = 3 for num_coords in [5000, 10000, 15000]: coords = np.random.randn(num_coords, 2) funcs = [ current, faster_current, numpy_linalg, variant_numba, variant_scipy, ] results = [] for func in funcs: elapsed = timeit.timeit( f"{func.__name__}(coords)", globals=globals(), number=NUMBER ) avg = round(elapsed / NUMBER, 2) results.append((num_coords, func.__name__, avg)) print(tabulate(results, headers=headers)) # current and numpy_linalg are too slow for num_coords in [20000, 25000, 30000]: coords = np.random.randn(num_coords, 2) funcs = [ faster_current, variant_numba, variant_scipy, ] results = [] for func in funcs: elapsed = timeit.timeit( f"{func.__name__}(coords)", globals=globals(), number=NUMBER ) avg = round(elapsed / NUMBER, 2) results.append((num_coords, func.__name__, avg)) print(tabulate(results, headers=headers)) ```

Results

Results: ``` # of coords Function Time (s) ------------- -------------- ---------- 5000 current 0.4 5000 faster_current 0.14 5000 numpy_linalg 0.41 5000 variant_numba 0.21 5000 variant_scipy 0.33 # of coords Function Time (s) ------------- -------------- ---------- 10000 current 1.58 10000 faster_current 0.44 10000 numpy_linalg 1.58 10000 variant_numba 0.36 10000 variant_scipy 1.19 # of coords Function Time (s) ------------- -------------- ---------- 15000 current 4.29 15000 faster_current 0.89 15000 numpy_linalg 4.94 15000 variant_numba 0.76 15000 variant_scipy 2.62 # of coords Function Time (s) ------------- -------------- ---------- 20000 faster_current 1.54 20000 variant_numba 1.39 20000 variant_scipy 4.61 # of coords Function Time (s) ------------- -------------- ---------- 25000 faster_current 2.37 25000 variant_numba 2.16 25000 variant_scipy 7.2 # of coords Function Time (s) ------------- -------------- ---------- 30000 faster_current 66.12 30000 variant_numba 6.8 30000 variant_scipy 10.86 ```

For instances up to 25K, the faster_current function is pretty fast. At 30K it becomes much slower than the numba or scipy - could this be due to memory issues? I'm not sure. Anyway, I think most users don't solve 30K instances anyway and the 60s-ish runtime is acceptable. I also don't want vrplib to rely on heavy dependencies like numba or scipy, so sticking to the numpy version is preferred.

I'll implement the faster_current version soon.

Another option is to write some C++ code.

See this for setting up Poetry with a custom build.

PyVRP / VRPLIB

Parse large instances with 15K or 30K nodes #117