elki-project / elki

ELKI Data Mining Toolkit
https://elki-project.github.io/
GNU Affero General Public License v3.0
780 stars 321 forks source link

Added CBLOF algorithm #24

Closed patrickkostjens closed 8 years ago

patrickkostjens commented 8 years ago

This is my implementation of the CBLOF algorithm by He et al. I used the LOF and KMeansOutlier algorithms as the main examples for the implementation. Furthermore, I put 0.7.2 as the @since version in the comments, but I am unsure whether that version is correct.

codecov-io commented 8 years ago

Current coverage is 33.11%

Merging #24 into master will increase coverage by +<.01%

  1. 2 files (not in diff) in .../de/lmu/ifi/dbs/elki were modified. more
@@             master        #24   diff @@
==========================================
  Files          1332       1333     +1   
  Lines         66147      66233    +86   
  Methods           0          0          
  Messages          0          0          
  Branches      14255      14269    +14   
==========================================
+ Hits          21871      21930    +59   
- Misses        41739      41759    +20   
- Partials       2537       2544     +7   

Powered by Codecov. Last updated by fd16d3a...69de830

kno10 commented 8 years ago

I think this could be modified to use PrototypeModel instead of MeanModel (to allow PAM etc. instead of k-means; and arbitrary distance functions), or it should allow NumberVectorDistanceFunction only instead of arbitary distances. @since should be added when preparing a new release, on everything new in that release. @Alias is for backwards compatibility with old class names, so this is also not needed on new functionality. clusterBoundary should probably be a int. But this looks pretty good, I will merge it when I have time (I'm busy with other projects right now, sorry)

patrickkostjens commented 8 years ago

I incorporated all except your first comment since I am not entirely sure how to make the modifications for that one. I agree with you that it would be better to use PrototypeModel instead of MeanModel. However, the PrototypeModel requires a type paramater that is different from the type paramater of the database and the CBLOF algorithm as it currently is. To be able to use the distance function of the database, the type of the PrototypeModel has to match that of the DistanceQuery, but I do not think that is possible since that is fixed to be the same as the database and algorithm parameter.

I hope this makes sense, but what I think it comes down to is that I do not know a way to perform distance queries on an arbitrary output type returned as the prototype by PrototypeModel.

kno10 commented 8 years ago

Merged in 512af8454e031339eee04193c6ab0b4769203bb3, thank you!