Closed BraulioSanchez closed 2 years ago
Merging #94 (255127c) into master (2af96da) will decrease coverage by
1.00%
. The diff coverage is96.87%
.:exclamation: Current head 255127c differs from pull request most recent head b83dde7. Consider uploading reports for the commit b83dde7 to get more accurate results
@@ Coverage Diff @@
## master #94 +/- ##
============================================
- Coverage 51.88% 50.88% -1.01%
+ Complexity 12563 12110 -453
============================================
Files 1808 1727 -81
Lines 90538 86216 -4322
Branches 16726 15869 -857
============================================
- Hits 46977 43870 -3107
+ Misses 39172 38225 -947
+ Partials 4389 4121 -268
Impacted Files | Coverage Δ | |
---|---|---|
.../java/elki/clustering/kmeans/KMeansMinusMinus.java | 91.66% <83.33%> (+3.00%) |
:arrow_up: |
...r/clustering/KMeansMinusMinusOutlierDetection.java | 100.00% <100.00%> (ø) |
|
.../utilities/datastructures/iterator/FilteredIt.java | 0.00% <0.00%> (-65.00%) |
:arrow_down: |
...rc/main/java/elki/data/model/CoreObjectsModel.java | 0.00% <0.00%> (-40.00%) |
:arrow_down: |
...ionhandling/parameterization/Parameterization.java | 58.33% <0.00%> (-33.34%) |
:arrow_down: |
...lities/datastructures/unionfind/UnionFindUtil.java | 0.00% <0.00%> (-33.34%) |
:arrow_down: |
...asource/filter/AbstractStreamConversionFilter.java | 72.72% <0.00%> (-22.73%) |
:arrow_down: |
...java/elki/database/ids/integer/IntegerDBIDVar.java | 29.16% <0.00%> (-14.59%) |
:arrow_down: |
...ical/extraction/SimplifiedHierarchyExtraction.java | 77.30% <0.00%> (-10.63%) |
:arrow_down: |
...i/utilities/optionhandling/ParameterException.java | 63.15% <0.00%> (-10.53%) |
:arrow_down: |
... and 219 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
I'd rather keep the noise flag option, but make the default behavior follow the original publication. For many clustering evaluation cases, it will be necessary to assign them to the nearest cluster.
As for the current code, I don't know if we shouldn't solve this differently: Right now, the code produces a binary outlier label, which is effectively 1 exactly if objects are in a noise cluster. We could write a "NoiseAsOutliers" class that would work both with k-means-- as well as DBSCAN and perform this transformation. But it would likely be more in line with k-means-- – which ranks objects by the distance to the nearest cluster centers – to produce a score based on the distance to the cluster center, i.e. use KMeansOutlierDetection with k-means-- and assign "noise" points to the nearest cluster (i.e., without kmeansmm.noisecluster). Which also allows comparing regular k-means and k-means-- consistently.
Source code of
KMeansMinusMinusOutierDetection
for identifying noise points as outliers, noise flag removed fromKMeansMinusMinus
to keep the original publication proposal.