boreholes.find_duplicates() is very slow for large fields with thousands of boreholes. This is likely due to the double for loop used to compute distances. This can be vectorized using numpy.
Timing results
Current implementation
>>> import pygfunction as gt
>>> field = gt.boreholes.rectangle_field(100, 100, 7.5, 7.5, 150., 4., 0.075)
>>> %timeit gt.boreholes.find_duplicates(field)
1min 34s ± 318 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Using scipy.spatial
>>> import pygfunction as gt
>>> import numpy as np
>>> from scipy import spatial
>>> N_1 = 100; N_2 = 100; H = 150.0; D = 4.0; B = 7.5; r_b = 0.075
>>> field = gt.boreholes.rectangle_field(N_1, N_2, B, B, H, D, r_b)
>>> nBoreholes = len(field)
>>> coordinates = np.array([[b.x, b.y] for b in field])
>>> distances = spatial.distance.pdist(coordinates, 'euclidean')
>>> duplicate_index = np.argwhere(distances < r_b)
>>> duplicate_i = nBoreholes - 2 - np.floor(np.sqrt(-8*duplicate_index + 4*nBoreholes*(nBoreholes-1)-7)/2 - 0.5)
>>> duplicate_j = duplicate_index + duplicate_i + 1 - nBoreholes*(nBoreholes-1)/2 + (nBoreholes-duplicate_i)*((nBoreholes-duplicate_i)-1)/2
>>> %timeit nBoreholes = len(field); coordinates = np.array([[b.x, b.y] for b in field]); distances = spatial.distance.pdist(coordinates, 'euclidean'); duplicate_index = np.argwhere(distances < r_b); duplicate_i = nBoreholes - 2 - np.floor(np.sqrt(-8*duplicate_index + 4*nBoreholes*(nBoreholes-1)-7)/2 - 0.5); duplicate_j = duplicate_index + duplicate_i + 1 - nBoreholes*(nBoreholes-1)/2 + (nBoreholes-duplicate_i)*((nBoreholes-duplicate_i)-1)/2
357 ms ± 5.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
boreholes.find_duplicates()
is very slow for large fields with thousands of boreholes. This is likely due to the doublefor
loop used to compute distances. This can be vectorized using numpy.Timing results
Current implementation
Using
scipy.spatial