fangfangli / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

Known problem saving Weka SparseInstance objects from datasets that have string attributes #339

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
Use the WekaStringOutcomeDataWriter to write at least one SparseInstance with 
at least one string attribute.

What is the expected output?
The output training-data.arff should contain all instances and all values for 
each instance.

What do you see instead?
The output training-data.arff looks 99% ok: except for the first instance in 
the file, which will be missing the value for each string attribute.

What version of the product are you using? On what operating system?
I have a Maven dependency on cleartk-ml-weka 0.1.0 and am running on Windows 7 
64-bit.

This is a known problem with Weka [1][2], but unfortunately I couldn't see how 
to fix it in their code. So until they sort themselves out with how they want 
to handle this I thought it would be better for me to share my simple pragmatic 
cleartk robustness fix (see attached Git patch) with the community. I think 
WekaStringOutcomeDataWriter is a good place to make this fix as its goals 
particularly match the steps to recreate: it is hardcoded to generate sparse 
.arff data, but its unit tests I guess do not test for this issue yet.

[1] 
http://weka.wikispaces.com/Why+am+I+missing+certain+nominal+or+string+values+fro
m+sparse+instances%3F
[2] http://weka.wikispaces.com/ARFF+%28stable+version%29#Sparse%20ARFF%20files

Original issue reported on code.google.com by fergalmo...@gmail.com on 22 Oct 2012 at 1:08

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by fergalmo...@gmail.com on 22 Oct 2012 at 1:12

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 28 Mar 2013 at 1:34

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 0fb023b83a9a.

Original comment by steven.b...@gmail.com on 3 May 2013 at 8:07

GoogleCodeExporter commented 9 years ago
Thanks for the patch, and sorry for the long delay in applying it. The Weka 
wrappers are still far from stable, but I agree that your workaround is the 
best thing we can do in the current situation to make sure that all the 
instances and attributes get written out to the ARFF file.

Original comment by steven.b...@gmail.com on 3 May 2013 at 8:08