MassimoCimmino / pygfunction

An open-source toolbox for the evaluation of thermal response factors (g-functions) of geothermal borehole fields.
BSD 3-Clause "New" or "Revised" License
46 stars 21 forks source link

boreholes.find_duplicates slow for large fields #206

Closed MassimoCimmino closed 2 years ago

MassimoCimmino commented 2 years ago

boreholes.find_duplicates() is very slow for large fields with thousands of boreholes. This is likely due to the double for loop used to compute distances. This can be vectorized using numpy.

Timing results

Current implementation

>>> import pygfunction as gt
>>> field = gt.boreholes.rectangle_field(100, 100, 7.5, 7.5, 150., 4., 0.075)
>>> %timeit gt.boreholes.find_duplicates(field)
1min 34s ± 318 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Using scipy.spatial

>>> import pygfunction as gt
>>> import numpy as np
>>> from scipy import spatial
>>> N_1 = 100; N_2 = 100; H = 150.0; D = 4.0; B = 7.5; r_b = 0.075
>>> field = gt.boreholes.rectangle_field(N_1, N_2, B, B, H, D, r_b)
>>> nBoreholes = len(field)
>>> coordinates = np.array([[b.x, b.y] for b in field])
>>> distances = spatial.distance.pdist(coordinates, 'euclidean')
>>> duplicate_index = np.argwhere(distances < r_b)
>>> duplicate_i = nBoreholes - 2 - np.floor(np.sqrt(-8*duplicate_index + 4*nBoreholes*(nBoreholes-1)-7)/2 - 0.5)
>>> duplicate_j = duplicate_index + duplicate_i + 1 - nBoreholes*(nBoreholes-1)/2 + (nBoreholes-duplicate_i)*((nBoreholes-duplicate_i)-1)/2
>>> %timeit nBoreholes = len(field); coordinates = np.array([[b.x, b.y] for b in field]); distances = spatial.distance.pdist(coordinates, 'euclidean'); duplicate_index = np.argwhere(distances < r_b); duplicate_i = nBoreholes - 2 - np.floor(np.sqrt(-8*duplicate_index + 4*nBoreholes*(nBoreholes-1)-7)/2 - 0.5); duplicate_j = duplicate_index + duplicate_i + 1 - nBoreholes*(nBoreholes-1)/2 + (nBoreholes-duplicate_i)*((nBoreholes-duplicate_i)-1)/2
357 ms ± 5.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)