astrolabsoftware / spark3D

Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
https://astrolabsoftware.github.io/spark3D/
Apache License 2.0
30 stars 16 forks source link

Introducing KNN + DataFrame API #110

Closed JulienPeloton closed 5 years ago

JulienPeloton commented 5 years ago

This PR re-introduces the KNN method, but with the DataFrame API.

The core of the routine remains untouched (developed during GSoC 2018 by Mayur, and using RDD), but the I/O are different. It takes a DataFrame as input, and return a DataFrame containing the coordinates of the K nearest neighbours.

Note: the option unique seems bugged (i.e. using KNN with unique=true produces wrong results). I know that ordering.leastOf recently changes its interface (from Scala iterable to Java Iterable) - so I won't be surprised if the meaning changed as well.

codecov-io commented 5 years ago

Codecov Report

Merging #110 into master will increase coverage by 2.8%. The diff coverage is 92.85%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #110     +/-   ##
=========================================
+ Coverage   86.84%   89.64%   +2.8%     
=========================================
  Files          29       32      +3     
  Lines        1140     1178     +38     
  Branches      201      194      -7     
=========================================
+ Hits          990     1056     +66     
+ Misses        150      122     -28
Flag Coverage Δ
#python 91.24% <ø> (-2.4%) :arrow_down:
#scala 88.95% <92.85%> (+4.88%) :arrow_up:
Impacted Files Coverage Δ
src/main/scala/com/spark3d/Repartitioning.scala 80.43% <ø> (ø) :arrow_up:
...main/scala/com/spark3d/python/PythonClassTag.scala 100% <100%> (ø) :arrow_up:
...com/spark3d/utils/BoundedUniquePriorityQueue.scala 90.47% <100%> (+90.47%) :arrow_up:
...c/main/scala/com/spark3d/spatialOperator/KNN.scala 100% <100%> (ø)
src/main/scala/com/spark3d/utils/Utils.scala 94.11% <88.88%> (+38.85%) :arrow_up:
src/main/scala/com/spark3d/Queries.scala 93.33% <93.33%> (ø)
checkers.py 100% <0%> (ø) :arrow_up:
queries.py 41.66% <0%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 19873df...155392a. Read the comment docs.