astrolabsoftware / spark3D

Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
https://astrolabsoftware.github.io/spark3D/
Apache License 2.0
30 stars 16 forks source link

Failing KNN test #79

Closed JulienPeloton closed 6 years ago

JulienPeloton commented 6 years ago

Looking at Travis, there is something weird. Starting from commit cfc7a5f, sometimes the build fails, sometimes it succeeds. From cfc7a5f, I have done only commits related to documentation (no code change) — so I’m wondering why this behaviour. It is the same in my laptop, sometimes it fails, sometimes it succeeds.

Looking at the failing test (SpatialQueryTest.scala:Can you find the K nearest neighbours correctly?), it seems that there is a little problem when looking at unique elements...

Return unique elements

scala> // Run many times the same query
scala> for (i <- 0 to 10) {
     |   val knn = SpatialQuery.KNN(queryObject, sphereRDD_part, 3, true)
     |   println(knn.map(x => x.center.getCoordinate))
     | }
List(List(2.0, 2.0, 2.0), List(1.0, 1.0, 3.0), List(1.0, 1.0, 1.0))
List(List(2.0, 2.0, 2.0), List(1.0, 1.0, 3.0), List(3.0, 2.0, 1.0))
List(List(2.0, 2.0, 2.0), List(1.0, 1.0, 3.0), List(1.0, 1.0, 1.0))
List(List(2.0, 2.0, 2.0), List(1.0, 1.0, 3.0), List(1.0, 3.0, 0.7))
List(List(2.0, 2.0, 2.0), List(1.0, 1.0, 3.0), List(1.0, 3.0, 0.7))
List(List(2.0, 2.0, 2.0), List(1.0, 3.0, 0.7), List(3.0, 2.0, 1.0))
List(List(2.0, 2.0, 2.0), List(1.0, 1.0, 3.0), List(1.0, 3.0, 0.7))
List(List(2.0, 2.0, 2.0), List(1.0, 1.0, 3.0), List(1.0, 3.0, 0.7))
List(List(2.0, 2.0, 2.0), List(1.0, 1.0, 3.0), List(1.0, 3.0, 0.7))
List(List(2.0, 2.0, 2.0), List(1.0, 3.0, 0.7), List(3.0, 2.0, 1.0))
List(List(2.0, 2.0, 2.0), List(1.0, 1.0, 3.0), List(1.0, 3.0, 0.7))

The 2nd and 3rd elements are not always the same (and it is not just a matter of ordering)! Hence why the test is sometimes failing, sometimes passing. This looks like a bug to correct... @mayurdb any ideas?

mayurdb commented 6 years ago

Yeah, the test case does looks flaky. I'll take a look!

mayurdb commented 6 years ago

While debugging this, we found another issue because of the duplicates, which might cause less than k elements being returned even though RDD has more than k elements,

Resolving this would require us to maintain a single priority queue for all partitions, this will destroy the parallelism and at the same time, will result in a big list being shuffled across the network.

mayurdb commented 6 years ago

Have created #80 with the fix for inconsistency in the results.