Waikato / moa

MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
GNU General Public License v3.0
610 stars 353 forks source link

Generating Multi-Label Synthetic Data Stream gives a NullPointerException #137

Open abuyukcakir opened 6 years ago

abuyukcakir commented 6 years ago

Hey There,

The issue that I will talk about next is discussed here previously: https://groups.google.com/forum/#!topic/moa-development/ho-_Z22k1-E

The task WriteStreamToARFFFile does not work properly. Although some initial statistics on the distribution of the label sets are outputted to the terminal, the process terminates with a NullPointerException.

The error is replicated by some other user in MOA Development Google Group as well.

The error is similar to this:

Failure reason: Failed writing to file /home/****/Synth.arff *** STACK TRACE ***java.lang.RuntimeException: Failed writing to file /home/****/Synth.arff at moa.tasks.WriteStreamToARFFFile.doMainTask(WriteStreamToARFFFile.java:86) at moa.tasks.MainTask.doTaskImpl(MainTask.java:50) at moa.tasks.AbstractTask.doTask(AbstractTask.java:57) at moa.tasks.TaskThread.run(TaskThread.java:76) Caused by: java.lang.NullPointerException at com.yahoo.labs.samoa.instances.SparseInstanceData.locateIndex(SparseInstanceData.java:237) at com.yahoo.labs.samoa.instances.SparseInstanceData.setValue(SparseInstanceData.java:220) at com.yahoo.labs.samoa.instances.InstanceImpl.setValue(InstanceImpl.java:269) at moa.streams.generators.multilabel.MetaMultilabelGenerator.generateMLInstance(MetaMultilabelGenerator.java:274) at moa.streams.generators.multilabel.MetaMultilabelGenerator.nextInstance(MetaMultilabelGenerator.java:228) at moa.streams.generators.multilabel.MetaMultilabelGenerator.nextInstance(MetaMultilabelGenerator.java:46) at moa.tasks.WriteStreamToARFFFile.doMainTask(WriteStreamToARFFFile.java:80) ... 3 more

The setting which results in the error is as follows:

  1. Pick 'WriteStreamToARFFFile' task. As its options:

    • stream: generators.multilabel.MetaMultilabelGenerator (with default values. I also tried to change some of the options there, such as NumLabels and LabelCardinality)
    • arffFile: An empty file that I specified with proper read write permissions.
    • maxInstances: 100,000. Or any other value
    • taskResultFile: This is left blank, as it is for the results on the generated data (for most common labelset etc.)
Juancard commented 4 years ago

Yes. Same happens to me whenever I run WriteStreamToArffFile with the MetaMultilabelGenerator.

Juancard commented 4 years ago

I finally solved it by forcing the generator to use a dense representation instead of a sparse one. That is, in the method called "generateMLInstance" I changed the following line: Instance x_ml = new SparseInstance(this.multilabelStreamTemplate.numAttributes()); with: Instance x_ml = new DenseInstance(this.multilabelStreamTemplate.numAttributes()); And then it works. Don't forget to add the corresponding import.

JayKumarr commented 4 years ago

I changed the line in the SparseInstance.java Line 49: super(1, null, null, (int) numberAttributes); into super((int) numberAttributes);

We are changing the type of constructor.

An example setting:


ospanbatyr commented 2 years ago

This issue still persists. I have tried every solution here with the given setup of @JayKumarr but Multi-Label stream generation still does not work.

As you can see in the screenshot, I have used @JayKumarr 's suggestion and his setup, but I couldn't solve my problem. Also I have saved this file into the JAR package so I'm sure this is the running code.


Edit: I have also tried @juancard's suggestion but it did not solve my problem neither.