OpenDrift / opendrift

Open source framework for ocean trajectory modelling
https://opendrift.github.io
GNU General Public License v2.0
245 stars 120 forks source link

The difference between scipy.spatial.cKDTree and pygeos.STRtree #889

Open lyingTree opened 2 years ago

lyingTree commented 2 years ago

In the opendrift/readers/basereader/unstructured.py ,I found the method _buildckdtree and _nearestckdtree take a long time when running the openoil, so I made the following changes in the method which find the nearest point:

# ----------------------------unstructured.py----------------------------
# def _build_ckdtree_(self, x, y):
#     from scipy.spatial import cKDTree
#     P = np.vstack((x, y)).T
#     return cKDTree(P)
# Replace _build_ckdtree_ with _build_strtree_
def _build_strtree_(self, x, y):
    from pygeos import STRtree, points
    return STRtree(points(x,y))

#def __nearest_ckdtree__(self, idx, x, y):
#    q = np.vstack((x, y)).T
#    return idx.query(q, k = 1, workers = self.PARALLEL_WORKERS)[1]
# Replace __nearest_ckdtree__ with __nearest_strtree__
def __nearest_strtree__(self, idx, x, y):
    from pygeos import points
    return idx.nearest(pygeos.points(x,y))[1]

# line 180
# return self.__nearest_ckdtree__(self.nodes_idx, x, y)
return self.__nearest_strtree__(self.nodes_idx, x, y)

# line 186
# return self.__nearest_ckdtree__(self.nodes_idx, x, y)
return self.__nearest_strtree__(self.nodes_idx, x, y)
# ----------------------------reader_netCDF_CF_unstructured.py----------------------------
# line 173
# self.nodes_idx = self._build_ckdtree_(self.x, self.y)
self.nodes_idx = self._build_strtree_(self.x, self.y)

# line 176
# self.nodes_idx = self._build_ckdtree_(self.x, self.y)
self.faces_idx = self._build_strtree_(self.x, self.y)

Here I use the current data from FVCOM and the constant wind to drive openoil, so I only modified the reader_netCDF_CF_unstructured.py and unstructured.py, but the graph of tracks is different from before the modification. I wonder if this modification is correct? If it's correct, STRtree may be a better choice than cKDTree, beacauce it can increase the operating speed of opendrift by more than 10%.

image

knutfrode commented 2 years ago

If you are getting different results, then probably something is wrong. You can run pytest to check if tests are passing.

gauteh commented 2 years ago

Hi, it looks like pygeos is going to be merged with Shapely and be released as part of Shapely 2.0: https://pygeos.readthedocs.io/en/stable/, it seems that shapely 1.8 is a preview of 2.0. If this code is already part of shapely it would be great to use shapely in stead - one less dependency. If the STRtree works as well as ckdtree I think we should consider swapping it out, and actually reduce our dependencies, since we need shapely anyway.

https://shapely.readthedocs.io/en/stable/manual.html#str-packed-r-tree

If you submit your code as a pull request it will be easy to review and test for us.

lyingTree commented 2 years ago

If you are getting different results, then probably something is wrong. You can run pytest to check if tests are passing.

Yes, I have something wrong in operation. The results are same. The line 176 in reader_netCDF_CF_unstructured.py is error after my modification. The right modification is showed below:

# ----------------------------unstructured.py----------------------------
# def _build_ckdtree_(self, x, y):
#     from scipy.spatial import cKDTree
#     P = np.vstack((x, y)).T
#     return cKDTree(P)
# Replace _build_ckdtree_ with _build_strtree_
def _build_strtree_(self, x, y):
    from pygeos import STRtree, points
    return STRtree(points(x,y))

#def __nearest_ckdtree__(self, idx, x, y):
#    q = np.vstack((x, y)).T
#    return idx.query(q, k = 1, workers = self.PARALLEL_WORKERS)[1]
# Replace __nearest_ckdtree__ with __nearest_strtree__
def __nearest_strtree__(self, idx, x, y):
    from pygeos import points
    return idx.nearest(points(x,y))[1]

# line 180
# return self.__nearest_ckdtree__(self.nodes_idx, x, y)
return self.__nearest_strtree__(self.nodes_idx, x, y)

# line 186
# return self.__nearest_ckdtree__(self.faces_idx, xc, yc)
return self.__nearest_strtree__(self.faces_idx, xc, yc)
# ----------------------------reader_netCDF_CF_unstructured.py----------------------------
# line 173
# self.nodes_idx = self._build_ckdtree_(self.x, self.y)
self.nodes_idx = self._build_strtree_(self.x, self.y)

# line 176
# self.faces_idx = self._build_ckdtree_(self.xc, self.yc)
self.faces_idx = self._build_strtree_(self.xc, self.yc)

After my test, the efficiency has increased by more than 30%.

So It may be a nice choice to complete this function.

Thanks!

lyingTree commented 2 years ago

Hi, it looks like pygeos is going to be merged with Shapely and be released as part of Shapely 2.0: https://pygeos.readthedocs.io/en/stable/, it seems that shapely 1.8 is a preview of 2.0. If this code is already part of shapely it would be great to use shapely in stead - one less dependency. If the STRtree works as well as ckdtree I think we should consider swapping it out, and actually reduce our dependencies, since we need shapely anyway.

https://shapely.readthedocs.io/en/stable/manual.html#str-packed-r-tree

If you submit your code as a pull request it will be easy to review and test for us.

Thanks, it will be a nice improvement. I will submit my code after