Closed Huite closed 2 months ago
With some 1.3 million points:
def locate_scipy(geometry, max_distance):
xy = shapely.get_coordinates(geometry)
tree = KDTree(xy)
return tree.sparse_distance_matrix(tree, max_distance=max_distance).tocsr()
def locate_shapely(geometry, max_distance):
tree = shapely.STRtree(geometry)
i, j = tree.query(geometry, distance=max_distance, predicate="dwithin")
coords = shapely.get_coordinates(geometry)
dxy = coords[i] - coords[j]
distance = np.linalg.norm(dxy, axis=1)
return i, j, distance
matrix = locate_scipy(geometry, 2.0)
i, j, distance = locate_shapely(geometry, 2.0)
coo = sparse.coo_matrix((distance, (i, j)))
csr = coo.tocsr()
assert np.array_equal(csr.indptr, matrix.indptr)
assert np.array_equal(csr.indices, matrix.indices)
assert np.allclose(csr.data, matrix.data)
Of course, this still requires scipy sparse. But we can easily construct the indptr ourselves:
def locate_shapely(geometry, max_distance):
tree = shapely.STRtree(geometry)
i, j = tree.query(geometry, distance=max_distance, predicate="dwithin")
coords = shapely.get_coordinates(geometry)
dxy = coords[i] - coords[j]
distance = np.linalg.norm(dxy, axis=1)
indptr = np.empty(len(geometry) + 1)
indptr[0] = 0
indptr[1:] = np.cumsum(np.bincount(i))
return indptr, j, distance
%timeit matrix = locate_scipy(geometry, 2.0)
%timeit indptr, indices, distance = locate_shapely(geometry, 2.0)
1.81 s ± 101 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.13 s ± 199 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So it's only a factor 1.7 slower. That doesn't make scipy as a dependency worthwhile.
The scipy CSR matrix will have sorted indices, whereas the shapely constructed one won't, but for the snapping case, the order is arbitrary anyway.
I've currently used the scipy KDTree with a sparse distance matrix to get snapping candidates, but this obviously requires scipy.
It might be worthwhile to use the shapely STRtree instead; we just have to build the CSR form ourselves.