Open choni opened 11 years ago
Hi.
I dodged the bottleneck by brute changes. And, I updated pull request.
Kind regards, SHIINA, Hideaki
Wow. I am not quite sure if I should find the static initialization of heap pointers excellent or bad. I tend to like it as it seems to speed the search up.
Btw:
int myArray[10] = { 0 }; //all elements 0
Oh, I didn't know a notation like
int myArray[10] = { 0 }; //all elements 0
I changed and committed it. Thanks.
In case of Linux, "delete heap" instruction don't waste time so much like case of mac. Even, in that case, Heap pool makes searching fast a little.
[ new & delete heap ] cores = 2, Linux, query=512bits * 500, knnTime = 0.0728729 sec time[0] = 19006 us // new heap time[1] = 78815 us // first findNN loop time[2] = 418 us // second findNN loop time[3] = 18888 us // delete heap; time[4] = 117326 us // whole
[ using heap pool ] cores = 2, Linux, query=512bits * 500 knnTime = 0.0455761 sec time[0] = 91 us // get heap from pool time[1] = 57978 us // first findNN loop time[2] = 383 us // second findNN loop time[3] = 122 us // put back heap time[4] = 58688 us // whole
The timings look very good. But what matters at the end is the overall time consumption. I guess you should run a at least one real application that heavily depends on approximate nearest neighbor search (e.g. approximate k-means, vector distance computations, etc) and measure the wall time to get the overall speedup.
Hello, sromberg. Thanks for reply. We always tested real application (web API). It is good feeling on the changes. But certainly, time wastes at "delete heap" are large in particular on MacOS. In another environments, speed up rates is lower than in MacOS.
I want to know there is a bad effects for processing speed or not in a particular environment.
Hi, Marius.
I got another big fish!! I found a time wasting line "DynamicBitset checked(size_);" in HierarchicalClusteringIndex::findNeighborsWithRemoved.
On a trial, I wrote BitsetPool it made faster than current. I'll request for pull, soon.
My results are below:
Environment: MS Azure 8cores
// only with HeapPool knnTime = 2.52116sec, cores = 1 knnTime = 1.90988sec, cores = 2 knnTime = 1.54542sec, cores = 3 knnTime = 1.3012sec, cores = 4 knnTime = 1.18767sec, cores = 5 knnTime = 1.0411sec, cores = 6 knnTime = 1.03601sec, cores = 7 knnTime = 1.05012sec, cores = 8
// with HeapPool & BitsetPool knnTime = 0.136957sec, cores = 1 knnTime = 0.0742822sec, cores = 2 knnTime = 0.0746491sec, cores = 3 knnTime = 0.0554869sec, cores = 4 knnTime = 0.0553958sec, cores = 5 knnTime = 0.0496221sec, cores = 6 knnTime = 0.0551631sec, cores = 7 knnTime = 0.04017sec, cores = 8
Hello, Marius.
I use HierarchicalClusteringIndex::knnSearch. Performance bottleneck was found at the point that is not essential for searching.
This is the line "delete heap;" in HierarchicalClusteringIndex::findNeighborsWithRemoved.
In my case, about 85 to 90 percent time of searching was wasted at this line.
So, I commented out "delete heap;". It brings extreme fast results and huge memory leaks. :)
But, This pointed out potential for performance up of flann library, too.
Thanks.
SHIINA, Hideaki
Detailed results is below:
Environments: MacBook Pro Retina, 2.7 GHz Intel Core i7 4cores. DataSizes: Stock = 512 bits * 7961495, Query = 512bits * 500
code of measuring time is below: