Open mathause opened 3 years ago
And here is the fast and exact solution:
import pyproj
def geodist_exact_fast(lon, lat):
lon, lat = np.asarray(lon), np.asarray(lon)
if lon.shape != lat.shape:
raise ValueError("lon and lat need to have the same shape")
geod = pyproj.Geod(ellps="WGS84")
n_points = len(lon)
geodist = np.zeros([n_points, n_points])
# calculate only the lower half of the triangle
for i in range(n_points):
# need to duplicate gridpoint (required by geod.inv)
lt = np.tile(lat[i], n_points - (i + 1))
ln = np.tile(lon[i], n_points - (i + 1))
geodist[i, i + 1:] = geod.inv(ln, lt, lon[i+1:], lat[i+1:])[2]
# convert m to km
geodist /= 1000
# fill the upper half of the triangle (in-place)
geodist += np.transpose(geodist)
return geodist
Uses proj (pyproj) under the hood. Takes 1 s for the 25 53 grid points above and yields allclose
as mesmer.io.load_constant_files.calc_geodist_exact
. The current solution takes 2.5 min, so this is certainly worth it. For the 2.5° x 2.5° land-only grid (360 / 2.5 180 / 2.5 * 0.3 grid points) it takes about 5 s (in contrast to about 13 min).
So it's still too slow to remove saving & loading the data & my question still stands - could we switch to the approximation?
Currently we use
geopy.distance.distance
to calculate distances between gridpoints. This uses an exact algorithm, assuming an ellipsoid. For an example dataset of 25 * 53 gridpoints it takes 5min 30 sec to calculate all distances (on my laptop). I think we could reduce this to about 2 min.However, if we assume a sphere and use the Haversine formula we can get it down to << 1 sec. The error should generally be below 0.5 % (to be confirmed). An additional advantage is that we no longer need to save the distance and ghi_phi matrices.
To be checked and cleaned:
Current implementation:
Example: