Waikato / moa

MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
http://moa.cms.waikato.ac.nz/
GNU General Public License v3.0
610 stars 353 forks source link

getCoresetFromManager of BucketManager #70

Closed pcandido closed 7 years ago

pcandido commented 7 years ago

The function getCoresetFromManager() of class BucketManager is responsible to retrieve the coreset summarized in the buckets.

Why does the funcion return only the last bucket if it is full? And about new objects? The last bucket has the oldest objects of stream, and the new objects can spend much time to reach it.

See that when the last bucket is full, the next (2^(L-1))*m objects will make no difference to clustering, since only last bucket is returned.

richard-moulton commented 7 years ago

This part of the getCoresetFromManager() function does not make sense to me either. Having looked a bit deeper, I am not sure that this behaviour is consistent with the original paper. It seems to me that the coreset should be computed from all of the non-empty buckets whenever it is needed so that the clustering produced makes use of the most recent instances.

This is one of the modifications I have made to MOA's StreamKM++ algorithm and made available on Github here.

richard-moulton commented 7 years ago

This modification was a part of pull request #100, which as been merged into MOA's master branch.

Recommend closing this issue.