Closed JulienPeloton closed 6 years ago
Yeah, the test case does looks flaky. I'll take a look!
While debugging this, we found another issue because of the duplicates, which might cause less than k elements being returned even though RDD has more than k elements,
.coalesce
so, users should be careful to use .coalesce.distinct
)Resolving this would require us to maintain a single priority queue for all partitions, this will destroy the parallelism and at the same time, will result in a big list being shuffled across the network.
Have created #80 with the fix for inconsistency in the results.
Looking at Travis, there is something weird. Starting from commit cfc7a5f, sometimes the build fails, sometimes it succeeds. From cfc7a5f, I have done only commits related to documentation (no code change) — so I’m wondering why this behaviour. It is the same in my laptop, sometimes it fails, sometimes it succeeds.
Looking at the failing test (
SpatialQueryTest.scala:Can you find the K nearest neighbours correctly?
), it seems that there is a little problem when looking at unique elements...Return unique elements
The 2nd and 3rd elements are not always the same (and it is not just a matter of ordering)! Hence why the test is sometimes failing, sometimes passing. This looks like a bug to correct... @mayurdb any ideas?