elki-project / elki

ELKI Data Mining Toolkit
https://elki-project.github.io/
GNU Affero General Public License v3.0
785 stars 323 forks source link

Outlierness score considering a single cluster or noise #103

Closed BraulioSanchez closed 2 years ago

BraulioSanchez commented 2 years ago

This commit attempts to correct the attribution of anomaly scores in cases where there is only one group or noise.

codecov[bot] commented 2 years ago

Codecov Report

Merging #103 (fe902d1) into master (a60e18c) will increase coverage by 0.00%. The diff coverage is 60.00%.

@@            Coverage Diff            @@
##             master     #103   +/-   ##
=========================================
  Coverage     51.84%   51.84%           
- Complexity    12540    12542    +2     
=========================================
  Files          1807     1807           
  Lines         90484    90496   +12     
  Branches      16714    16720    +6     
=========================================
+ Hits          46909    46917    +8     
- Misses        39196    39200    +4     
  Partials       4379     4379           
Impacted Files Coverage Δ
...outlier/clustering/SilhouetteOutlierDetection.java 66.29% <60.00%> (-2.54%) :arrow_down:
...i-core-api/src/main/java/elki/result/Metadata.java 65.10% <0.00%> (-0.68%) :arrow_down:
...ain/java/elki/clustering/kmeans/HamerlyKMeans.java 91.73% <0.00%> (+0.82%) :arrow_up:
...rc/main/java/elki/clustering/kmedoids/FastPAM.java 87.62% <0.00%> (+3.09%) :arrow_up:

:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

kno10 commented 2 years ago

Solved this slightly differently, because it not only affects outlier detection - also for regular Silhouette evaluation, the case of a single cluster should be handled. But thank you for the report & proposed fix!