Bersaelor / KDTree

Swift implementation of a k-dimensional binary space partitioning tree.
MIT License
153 stars 26 forks source link

Built large trees faster with quickselect #38

Closed hsnetzer closed 5 years ago

hsnetzer commented 5 years ago

Instead of recursively sorting the array to find the median elements, we save time using quickselect.

I had to think about whether I could "ensure left subtree contains currentSplittingDimension-coordinate strictly less than its parent node." Because of the Lomuto partition algorithm, the answer is yes. I can go into greater detail but test cases speak for themselves.

I was trigger happy submitting the PR. I have some changes to make.

Bersaelor commented 5 years ago

Oh wow, thank you for the PR :)

I'll have a deep dive later today but so far it looks great.

Did you run some performance comparison before/after already? You can uncomment lines in the OSX Test example to get a nice testset over multiple array sizes.

hsnetzer commented 5 years ago

I need to go to work, but I noticed this morning that my algorithm runs into trouble with the stars dataset. Getting a bad access code. For what its worth, the program crashes with the original algorithm on my machine due to some multithreading issue..

All other tests so far have worked. Yes I am seeing performance improvements. I will get back to debugging tonight.

Bersaelor commented 5 years ago

Mhmm, I found a minor UI issue when running KDTree_OSX_Example on my mac, and made a change: https://github.com/Bersaelor/KDTree/pull/39 .

Otherwise the tests and the program is working fine, except when I check out your branch I get the following crash:

screen shot 2018-09-18 at 22 14 24

I also once did some investigation into parallelizing the tree building algorithm (here is the branch: https://github.com/Bersaelor/KDTree/commit/a2a0cea695cac978e0da954f7daa62d70b281bc5 ). But the performance boost was not big and I didn't feel like making the code more complicated and potentially more buggy with it. So I never merged those linked changes.

hsnetzer commented 5 years ago

Thank you Bersaelor, its a cool repo. The test cases and examples make it easy to work and your commitment to good style is invaluable as I get back to programming in Swift.

Indeed, good variable names change the way I look at the code. I'll read up more on Swift style.

The solution to the crash was a slightly different partition algorithm. Since the star array is sorted, Lomuto is really bad space and time complexity and I guess necessary memory was deallocated, or something.

Bersaelor commented 5 years ago

So I finally managed to run some performance testing today (very easy by uncommenting those lines, copy the print into a csv file and open with numbers or excel) Here's the result:

EDIT: Sorry, the results probably were wrong due to not cleaning the entire project folder.

hsnetzer commented 5 years ago

Hmm. I’m away from my computer and won’t be able to look for a couple days. I’m a bit surprised that these test results are so close. Are you sure that this was a clean test?

I understand that it’s hard to improve on the standard library but quickselect is doing something different from quicksort and under normal conditions should have a better time complexity.

Bersaelor commented 5 years ago

Good point, maybe I did forget to clean the folder entirely and xcode didn't rebuild?

Today I tried again and I can't get the tests to work anymore, always with the same error when testing on the array with 75000 points:

screen shot 2018-09-23 at 11 04 06

😔

hsnetzer commented 5 years ago

Weird, I could have sworn I solved that problem with the median of three algorithm before swift 4.2. Lucky there's another option for picking the pivot which makes use of the Int.random nicely. Not seeing the same performance boost as before though. Still a bit faster than the original.

I have the two branches in separate subdirectories so I can test side by side without worrying about Xcode getting confused about names being the same.

hsnetzer commented 5 years ago

I was able to run the previous Median of Three version with the Swift 4.1.3 toolchain. I don't know why Swift 4.2 breaks the Mo3, but it does.

screen shot 2018-09-27 at 3 29 49 pm

Here are the results of my tests. The old Mo3 is best, but the random pivot selection scheme is not so bad. Up to you whether the speedup is worth the complexity. How was the performance improvement of the multithreaded tree building?

By the way, the whole idea of building the tree with quickselect comes from https://github.com/mourner/kdbush