Waikato / moa

MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
http://moa.cms.waikato.ac.nz/
GNU General Public License v3.0
610 stars 353 forks source link

Sparse ARFF not Supported #139

Closed Humni closed 3 years ago

Humni commented 6 years ago

I was attempting to use a sparse ARFF as per the published specification, but it seems that they are not read correctly by the moa.streams.clustering.FileStream file reader. I discovered this when implementing a custom clustering outlier, I have provided the example ARFF files and the output I get from printing the instances below

Sparse ARFF - This one doesn't work

@RELATION navigationsequences 

@ATTRIBUTE "GET /~scottp/publish.html" NUMERIC
@ATTRIBUTE "GET /~lowey/kevin.gif" NUMERIC
@ATTRIBUTE "GET /~ladd/ostriches.html" NUMERIC
@ATTRIBUTE "GET /~lowey/" NUMERIC

@DATA
{0 1}
{2 1}
{0 1}
{3 1, 1 1}

Sparse ARFF Output

instance: [ ] 
instance: [ ] 
instance: [ ] 
instance: [ 0.00 ]

Normal ARFF - This one works

@RELATION navigationsequences 

@ATTRIBUTE "GET /~ladd/ostriches.html" NUMERIC
@ATTRIBUTE "GET /~scottp/publish.html" NUMERIC
@ATTRIBUTE "GET /~lowey/kevin.gif" NUMERIC
@ATTRIBUTE "GET /~lowey/" NUMERIC

@DATA
0, 1, 0, 0
1, 0, 0, 0
0, 1, 0, 0
0, 0, 1, 1

Sparse ARFF Output

instance: [ 0.00 1.00 0.00 ] 
instance: [ 1.00 0.00 0.00 ] 
instance: [ 0.00 1.00 0.00 ] 
instance: [ 0.00 0.00 1.00 ] 
abifet commented 3 years ago

Clustering in MOA only works with normalised dense instances.