Open dhoule opened 4 years ago
Current problem is "over pruning":
mpiexec -n 16 ./mpi_dbscan -i clus50k.bin -b -m 5 -e 25
Dataset used: clus50k.bin
Number of process cores 16
Epsilon: 25 MinPts: 5
Dimensions of each point: 10
Init time 0.00420398
Local computation took 2.3343
Merging took 0.0646524
Parallel DBSCAN (init, local computation, and merging) took 2.45995 seconds
Points in clusters 46787 Noise 3213 Total points 50000
Total number of clusters 566
The number of clusters is supposed to be 51. Like before, this is just fine tuning the code...
Well, the latest lest gave accurate results, but was HORRIBLE for time:
mpiexec -n 16 ./mpi_dbscan -i clus50k.bin -b -m 5 -e 25
Dataset used: clus50k.bin
Number of process cores 16
Epsilon: 25 MinPts: 5
Dimensions of each point: 10
Init time 0.00422827
Local computation took 105.643
Merging took 0.058158
Parallel DBSCAN (init, local computation, and merging) took 105.856 seconds
Points in clusters 46914 Noise 3086 Total points 50000
Total number of clusters 51
2 things I can think of. I'm using the C++ library functions set_intersection
, and copy
. The use of the standard functions seems to add a lot of time to the code. As Josh did, and I did with a binary search usage, I'm going to have to make my own "intersection" function to speed things up.
For the copy
function, I'm just going to use a "for loop."
This will be done after I've committed the code. I've met the accuracy requirement. Now, I have to meet the time requirement.
I reworked the code. I got the timing down to a more "reasonable" area, but the accuracy is off again:
mpiexec -n 16 ./mpi_dbscan -i clus50k.bin -b -m 5 -e 25
Dataset used: clus50k.bin
Number of process cores 16
Epsilon: 25 MinPts: 5
Dimensions of each point: 10
Init time 0.000541579
Local computation took 1.6466
Merging took 0.155283
Parallel DBSCAN (init, local computation, and merging) took 1.80828 seconds
Points in clusters 38250 Noise 11750 Total points 50000
Total number of clusters 4630
Some really great news! I used Josh's kdtree_set_difference
function, and got a massive increase in speed! After moving some code around, I got the accuracy back again. Will be committing soon.
The latest test results:
mpiexec -n 16 ./mpi_dbscan -i clus50k.bin -b -m 5 -e 25
Dataset used: clus50k.bin
Number of process cores 16
Epsilon: 25 MinPts: 5
Dimensions of each point: 10
Init time 0.00283968
Local computation took 30.2367
Merging took 0.055811
Parallel DBSCAN (init, local computation, and merging) took 30.3308 seconds
Points in clusters 45795 Noise 4205 Total points 50000
Total number of clusters 51
The next move needs to be getting rid of the std::copy
function. It is used twice in the dbscan.cpp
file. ~105 sec --> ~30 sec with just 1 custom function. Can't wait to see what happens with this one!
There was no improvement by getting rid of the std::copy()
function.
My next plan is to add "core points" if they don't exist, and remove culled points, from the main loop vector. I need to figure out how to keep the dbs.neededIndices
vector sorted while doing this. If the points have been seen not to be "core points," there is no sense in revisiting them.
I got the time time, but the accuracy is off again:
mpiexec -n 16 ./mpi_dbscan -i clus50k.bin -b -m 5 -e 25
Dataset used: clus50k.bin
Number of process cores 16
Epsilon: 25 MinPts: 5
Dimensions of each point: 10
Init time 0.00460396
Local computation took 4.86409
Merging took 0.0447575
Parallel DBSCAN (init, local computation, and merging) took 4.91621 seconds
Points in clusters 41450 Noise 8550 Total points 50000
Total number of clusters 198
Making a new check point. I find it odd that the time increased when going from 2 nodes to 16 nodes. It might be the need for more message passing.
mpiexec -n 2 ./mpi_dbscan -i clus50k.bin -b -m 5 -e 25
Dataset used: clus50k.bin
Number of process cores 2
Epsilon: 25 MinPts: 5
Dimensions of each point: 10
Init time 0.000403266
Local computation took 24.4247
Merging took 0.00231618
Parallel DBSCAN (init, local computation, and merging) took 24.4278 seconds
Points in clusters 45735 Noise 4265 Total points 50000
Total number of clusters 51
mpiexec -n 16 ./mpi_dbscan -i clus50k.bin -b -m 5 -e 25
Dataset used: clus50k.bin
Number of process cores 16
Epsilon: 25 MinPts: 5
Dimensions of each point: 10
Init time 0.00824141
Local computation took 30.9175
Merging took 0.0582847
Parallel DBSCAN (init, local computation, and merging) took 30.9877 seconds
Points in clusters 45795 Noise 4205 Total points 50000
Total number of clusters 51
There was a problem rebuilding the ne
vector. My last report is totally wrong. This is the latest report, as of the current corrections:
mpiexec -n 16 ./mpi_dbscan -i clus50k.bin -b -m 5 -e 25
Dataset used: clus50k.bin
Number of process cores 16
Epsilon: 25 MinPts: 5
Dimensions of each point: 10
Init time 0.00673148
Local computation took 61.8609
Merging took 0.0613414
Parallel DBSCAN (init, local computation, and merging) took 61.9743 seconds
Points in clusters 35729 Noise 14271 Total points 50000
Total number of clusters 58
I need to work on the timing, and the accuracy, again. But, it's down from the 120's sec.
The same npid
is being checked numerous times for different pid
values. This causes redundant searches, massively slowing down the code.
Maybe a way to skip these points, is to keep track of them? If the npid
values exists in some kind of tracking vector, there won't be a need to check it again, because it is a known center point.
But, it's from a different pid
, though? That could mean that overlap points might be missed.
The latest benchmark has taken the time down to ~7 sec, but the accuracy is hanging around 52-53 when it's supposed to 51.
The latest push has led to over pruning.
Still getting 51-52 clusters with the clus50k.bin
dataset...
This will take some time. Have to look into exactly where to do it.
Will update accordingly.