elki-project / elki

ELKI Data Mining Toolkit
https://elki-project.github.io/
GNU Affero General Public License v3.0
785 stars 323 forks source link

Hysortod #89

Closed BraulioSanchez closed 3 years ago

codecov[bot] commented 3 years ago

Codecov Report

Merging #89 (caccbc6) into master (0082485) will increase coverage by 0.07%. The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #89      +/-   ##
============================================
+ Coverage     50.55%   50.63%   +0.07%     
- Complexity    12041    12052      +11     
============================================
  Files          1723     1724       +1     
  Lines         85493    85673     +180     
  Branches      15713    15746      +33     
============================================
+ Hits          43225    43382     +157     
- Misses        38208    38215       +7     
- Partials       4060     4076      +16     
Impacted Files Coverage Δ
...r/src/main/java/elki/outlier/density/HySortOD.java 93.67% <100.00%> (ø)
...ava/elki/data/synthetic/bymodel/GeneratorMain.java 39.41% <0.00%> (-2.92%) :arrow_down:
...data/synthetic/bymodel/GeneratorSingleCluster.java 48.86% <0.00%> (-2.28%) :arrow_down:
...-various/src/main/java/elki/index/laesa/LAESA.java 90.55% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0082485...caccbc6. Read the comment docs.

kno10 commented 3 years ago

Why should this be more efficient? Do you have benchmark measurements?

BraulioSanchez commented 3 years ago

Yes, we have a benchmark considering all the data sets used in the article. The efficiency of this algorithm is sensitive to the number of dimensions. We use a data set of 25 dimensions and 25k instances, we compare the execution time of this implementation (14 sec) with the original one (4 sec). The bottleneck is in the getDensities method of the TreeStrategy class.

kno10 commented 3 years ago

Sorry that I have not been more responsive. Too many duties. Your patch undoes two changes: (i) lazy initialization of the children map, (ii) null checks location in the density function. Did you benchmark both separately? It's always a bit tricky to predict Hotspot performance. Often the shorter code yields shorter bytecode, which may end up being optimized better; that is why I did this one-liner for the recursive calls.

Can you try 00824854dc7a4daa0489ce3513434a6cd202a57c with your benchmark suite? Thank you.