apache / sedona

A cluster computing framework for processing large-scale geospatial data
https://sedona.apache.org/
Apache License 2.0
1.97k stars 692 forks source link

how to perform KNN search ( find K nearest neighbours from query point) in Geospark? #279

Closed sparkcassuser closed 1 year ago

sparkcassuser commented 6 years ago

Expected behavior

Actual behavior

Steps to reproduce the problem

Settings

GeoSpark version = ?

Apache Spark version = ?

JRE version = 1.8?

API type = Scala or Java?

ValdarT commented 4 years ago

This is specified here: https://datasystemslab.github.io/GeoSpark/tutorial/sql/#knn-query

I think it would be nice to also introduce the ST_Neighbors convenience function that's mentioned in some of the GeoSpark papers and presentations, however.

ValdarT commented 4 years ago

Actually, I'm also unable to achieve this nicely with GeoSpark.

This works nicely for finding points within the specified distance by performing a DistanceJoin.

select *
from points1, points2
where ST_Distance(points1.coordinates, points2.coordinates) < 100

I could not find a better solution than to use the distance filter and then just rank the distances with a window function as usual in SQL. Could definitely use something like ST_Neighbors for better efficiency. Or am I missing something here?