Waikato / moa

MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
http://moa.cms.waikato.ac.nz/
GNU General Public License v3.0
610 stars 353 forks source link

Generating Multi-Label Synthetic Data Stream gives a NullPointerException #137

Open abuyukcakir opened 6 years ago

abuyukcakir commented 6 years ago

Hey There,

The issue that I will talk about next is discussed here previously: https://groups.google.com/forum/#!topic/moa-development/ho-_Z22k1-E

The task WriteStreamToARFFFile does not work properly. Although some initial statistics on the distribution of the label sets are outputted to the terminal, the process terminates with a NullPointerException.

The error is replicated by some other user in MOA Development Google Group as well.

The error is similar to this:

Failure reason: Failed writing to file /home/****/Synth.arff *** STACK TRACE ***java.lang.RuntimeException: Failed writing to file /home/****/Synth.arff at moa.tasks.WriteStreamToARFFFile.doMainTask(WriteStreamToARFFFile.java:86) at moa.tasks.MainTask.doTaskImpl(MainTask.java:50) at moa.tasks.AbstractTask.doTask(AbstractTask.java:57) at moa.tasks.TaskThread.run(TaskThread.java:76) Caused by: java.lang.NullPointerException at com.yahoo.labs.samoa.instances.SparseInstanceData.locateIndex(SparseInstanceData.java:237) at com.yahoo.labs.samoa.instances.SparseInstanceData.setValue(SparseInstanceData.java:220) at com.yahoo.labs.samoa.instances.InstanceImpl.setValue(InstanceImpl.java:269) at moa.streams.generators.multilabel.MetaMultilabelGenerator.generateMLInstance(MetaMultilabelGenerator.java:274) at moa.streams.generators.multilabel.MetaMultilabelGenerator.nextInstance(MetaMultilabelGenerator.java:228) at moa.streams.generators.multilabel.MetaMultilabelGenerator.nextInstance(MetaMultilabelGenerator.java:46) at moa.tasks.WriteStreamToARFFFile.doMainTask(WriteStreamToARFFFile.java:80) ... 3 more

The setting which results in the error is as follows:

  1. Pick 'WriteStreamToARFFFile' task. As its options:

    • stream: generators.multilabel.MetaMultilabelGenerator (with default values. I also tried to change some of the options there, such as NumLabels and LabelCardinality)
    • arffFile: An empty file that I specified with proper read write permissions.
    • maxInstances: 100,000. Or any other value
    • taskResultFile: This is left blank, as it is for the results on the generated data (for most common labelset etc.)
Juancard commented 4 years ago

Yes. Same happens to me whenever I run WriteStreamToArffFile with the MetaMultilabelGenerator.

Juancard commented 4 years ago

I finally solved it by forcing the generator to use a dense representation instead of a sparse one. That is, in the method called "generateMLInstance" I changed the following line: Instance x_ml = new SparseInstance(this.multilabelStreamTemplate.numAttributes()); with: Instance x_ml = new DenseInstance(this.multilabelStreamTemplate.numAttributes()); And then it works. Don't forget to add the corresponding import.

JayKumarr commented 4 years ago

I changed the line in the SparseInstance.java Line 49: super(1, null, null, (int) numberAttributes); into super((int) numberAttributes);

We are changing the type of constructor.

An example setting:

MOA_successfully_text_generation

ospanbatyr commented 2 years ago

This issue still persists. I have tried every solution here with the given setup of @JayKumarr but Multi-Label stream generation still does not work.

As you can see in the screenshot, I have used @JayKumarr 's suggestion and his setup, but I couldn't solve my problem. Also I have saved this file into the JAR package so I'm sure this is the running code.

image

Edit: I have also tried @juancard's suggestion but it did not solve my problem neither.