fangfangli / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

Weka wrapper for ClearTK #274

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
[from Torsten]

I want to use ClearTK's nice feature extraction capabilities to create
ARFF files (to be used with Weka).
As there seems to be no DataWriter for ARFF in ClearTK:
Has someone already written such a writer and would share the code?

While we talk about Weka:
Are there any specific obstacles in adding Weka support to ClearTK or
has simply nobody found time to do it, yet.
(I am thinking about trying to implement the necessary parts and don't
want to run into obvious traps).

[from Philip]

The GPL is fine with our new project structure so long as we keep the 
dependency isolated.  I think this would be a really important contribution.

I have written a nearly working Weka wrapper for ClearTK which I never polished 
up and released.   There are a few difficulties with it that I cant remember 
off the top of my head.  I will go dig around for the code and see if I can 
find anything that would help you get started.

[from Philip]

It looks like I wrote the training data writer part but didn't do the 
classifier - though I am certain that I have at one point figured out the 
necessary weka apis for this.  

Original issue reported on code.google.com by phi...@ogren.info on 29 Jan 2012 at 9:28

GoogleCodeExporter commented 9 years ago
I checked in a bit of code for the data writer.  It doesn't work.  I am about 
to do some refactoring of the code now and wanted the code I have checked in 
for backup.  

Original comment by phi...@ogren.info on 1 Feb 2012 at 4:21

GoogleCodeExporter commented 9 years ago
I've checked in a bit of code that looks good but isn't.  The basic idea is 
that I collect all of the information for the attributes in the features 
encoder and don't actually "encode the features" - it just returns them as they 
were given.  In the data writer I collect all the features and outcomes and 
then when finish() is called, I create the weka.core.Instances object and then 
ask it to write itself.  This last call throws an IndexOutOfBoundsException and 
I didn't have time to figure out why this evening.  One useful resource can be 
found here:

http://weka.wikispaces.com/Creating+an+ARFF+file

I've tried to model my code from this example - but I'm obviously doing 
something wrong somewhere.  To test my code as it is, look at 
WekaDataWriterTest and remove the @Ignore annotation.  It doesn't actually test 
anything just tries to run a simple pipeline and generate a file.  

Original comment by phi...@ogren.info on 1 Feb 2012 at 5:19

GoogleCodeExporter commented 9 years ago
I've committed a fix for the data writer and it seems to be working now - 
though it is barely tested.  Please give it a try and see if it works.  

I have also found the code I wrote years ago (pre-ClearTK) that has the 
necessary api calls for the classifier.  If I can't find time to implement the 
ClearTK Classifier wrapper, then I will post the important bits of code here.  

Original comment by phi...@ogren.info on 3 Feb 2012 at 5:41

GoogleCodeExporter commented 9 years ago
The initial scaffolding for the data writer is there and works now.  I'm going 
to file separate issues for what remains.

Original comment by phi...@ogren.info on 12 Feb 2012 at 10:04

GoogleCodeExporter commented 9 years ago

Original comment by steven.b...@gmail.com on 5 Aug 2012 at 8:50