gagolews / genieclust

Genie: Fast and Robust Hierarchical Clustering with Noise Point Detection - in Python and R
https://genieclust.gagolewski.com
Other
58 stars 11 forks source link

Jarnik's MST with query indexes kept sorted more cache-friendly? #62

Closed gagolews closed 4 years ago

gagolews commented 4 years ago

before:

6
d=10, n=50000, iter=0
   random_state     timestamp     dataset      n   d              method  n_clusters n_threads  elapsed_time       ar        r       fm      afm       mi       nmi       ami     nacc      psi
0             1  1.594542e+09  g2mg_10_20  50000  10  Genie_0.3_nomlpack           2         6      3.901408  0.99904  0.99952  0.99952  0.99904  0.69091  0.996773  0.996773  0.99952  0.99944
d=100, n=50000, iter=0
   random_state     timestamp      dataset      n    d              method  n_clusters n_threads  elapsed_time            ar        r       fm           afm        mi       nmi       ami    nacc      psi
0             1  1.594542e+09  g2mg_100_20  50000  100  Genie_0.3_nomlpack           2         6     26.401537  3.264181e-07  0.49999  0.70693  3.938901e-07  0.000166  0.000479  0.000448  0.0006  0.00024
d=10, n=50000, iter=1
   random_state     timestamp     dataset      n   d              method  n_clusters n_threads  elapsed_time      ar       r      fm     afm        mi       nmi       ami    nacc     psi
0             2  1.594542e+09  g2mg_10_20  50000  10  Genie_0.3_nomlpack           2         6      4.014916  0.9992  0.9996  0.9996  0.9992  0.691244  0.997254  0.997254  0.9996  0.9996
d=100, n=50000, iter=1
   random_state     timestamp      dataset      n    d              method  n_clusters n_threads  elapsed_time            ar        r        fm           afm        mi       nmi       ami     nacc      psi
0             2  1.594542e+09  g2mg_100_20  50000  100  Genie_0.3_nomlpack           2         6     26.566503  4.832190e-07  0.49999  0.706845  5.830038e-07  0.000172  0.000495  0.000465  0.00072  0.00032
d=10, n=50000, iter=2
   random_state     timestamp     dataset      n   d              method  n_clusters n_threads  elapsed_time       ar        r       fm      afm        mi       nmi       ami     nacc      psi
0             3  1.594542e+09  g2mg_10_20  50000  10  Genie_0.3_nomlpack           2         6      3.968966  0.99912  0.99956  0.99956  0.99912  0.691075  0.997011  0.997011  0.99956  0.99952
d=100, n=50000, iter=2
   random_state     timestamp      dataset      n    d              method  n_clusters n_threads  elapsed_time            ar        r       fm           afm        mi       nmi       ami    nacc      psi
0             3  1.594542e+09  g2mg_100_20  50000  100  Genie_0.3_nomlpack           2         6     26.292954  5.984283e-07  0.49999  0.70686  7.220242e-07  0.000236  0.000677  0.000648  0.0008  0.00034
gagolews commented 4 years ago

Yep.

d=10, n=50000, iter=0
   random_state     timestamp     dataset      n   d              method  n_clusters n_threads  elapsed_time       ar        r       fm      afm       mi       nmi       ami     nacc      psi
0             1  1.594543e+09  g2mg_10_20  50000  10  Genie_0.3_nomlpack           2         6        3.3671  0.99904  0.99952  0.99952  0.99904  0.69091  0.996773  0.996773  0.99952  0.99944
d=100, n=50000, iter=0
   random_state     timestamp      dataset      n    d              method  n_clusters n_threads  elapsed_time            ar        r       fm           afm        mi       nmi       ami    nacc      psi
0             1  1.594543e+09  g2mg_100_20  50000  100  Genie_0.3_nomlpack           2         6     24.093857  3.264181e-07  0.49999  0.70693  3.938901e-07  0.000166  0.000479  0.000448  0.0006  0.00024
d=10, n=50000, iter=1
   random_state     timestamp     dataset      n   d              method  n_clusters n_threads  elapsed_time      ar       r      fm     afm        mi       nmi       ami    nacc     psi
0             2  1.594543e+09  g2mg_10_20  50000  10  Genie_0.3_nomlpack           2         6      3.933692  0.9992  0.9996  0.9996  0.9992  0.691244  0.997254  0.997254  0.9996  0.9996
d=100, n=50000, iter=1
   random_state     timestamp      dataset      n    d              method  n_clusters n_threads  elapsed_time            ar        r        fm           afm        mi       nmi       ami     nacc      psi
0             2  1.594543e+09  g2mg_100_20  50000  100  Genie_0.3_nomlpack           2         6     26.052527  4.832190e-07  0.49999  0.706845  5.830038e-07  0.000172  0.000495  0.000465  0.00072  0.00032
d=10, n=50000, iter=2
   random_state     timestamp     dataset      n   d              method  n_clusters n_threads  elapsed_time       ar        r       fm      afm        mi       nmi       ami     nacc      psi
0             3  1.594543e+09  g2mg_10_20  50000  10  Genie_0.3_nomlpack           2         6      3.782147  0.99912  0.99956  0.99956  0.99912  0.691075  0.997011  0.997011  0.99956  0.99952
d=100, n=50000, iter=2
   random_state     timestamp      dataset      n    d              method  n_clusters n_threads  elapsed_time            ar        r       fm           afm        mi       nmi       ami    nacc      psi
0             3  1.594543e+09  g2mg_100_20  50000  100  Genie_0.3_nomlpack           2         6     26.088192  5.984283e-07  0.49999  0.70686  7.220242e-07  0.000236  0.000677  0.000648  0.0008  0.00034