DrDub / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

Fix performance issues with cleartk-ml-tksvmlight #342

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The classifiers in cleartk-ml-tksvmlight currently make a system call to the 
tksvmlight binaries for each instance to be classified. This means the model is 
loaded on every classification, which will lead to unacceptable performance in 
any real use case.

There are at least two ways to fix this:

(1) Do what was done in cleartk-ml-svmlight, and re-implement the 
classification part of tree-kernel SVMs in Java

(2) Use JNI or JNA to directly invoke the APIs in the tksvmlight C binary.

Original issue reported on code.google.com by steven.b...@gmail.com on 3 Dec 2012 at 6:21

GoogleCodeExporter commented 9 years ago
I think (1) is the way to go, my understanding is with (2) everyone who wants 
to use it would have to compile a special version of the tksvmlight binary 
versions on their own platform for this to work.

I wrote a TreeKernel class in cTAKES that I'd be happy to adapt, then from 
there it would be a matter of implementing a CompositeKernel that combines 
scores from other kernels in some specified way, and modifying the TKSVMLight* 
classes to read in the options that tk-svmlight uses. There is probably a bit 
of optimization in the c libraries that I'm not expert in, but it would be hard 
to be slower than the current configuration.

Original comment by tim.mil...@gmail.com on 8 Jan 2013 at 1:19

GoogleCodeExporter commented 9 years ago
Yeah, (2) would require a compiled binary for each platform.

So if you're willing to work on (1), that would be wonderful.

Original comment by steven.b...@gmail.com on 8 Jan 2013 at 2:39

GoogleCodeExporter commented 9 years ago
I have completed a patch that implements this, re-written the tests to use the 
new code, and added a test just for verifying the kernel gets the same result 
as the svmlight-tk code.  What is the best way to go about getting it added?

Original comment by tim.mil...@gmail.com on 15 Jan 2013 at 9:30

GoogleCodeExporter commented 9 years ago
This issue is fixed in the following repo:
https://github.com/tmills/cleartk-with-tree-kernels

I'm not positive but I think just the last two commits against the cleartk main 
repo should be complete.  Let me know if there is a better way for me to 
package up the diff.

Original comment by tim.mil...@gmail.com on 16 Feb 2013 at 9:11

GoogleCodeExporter commented 9 years ago
FWIW, I just did a clean checkout of cleartk, then did a pull from 
git@github.com:tmills/cleartk-with-tree-kernels.git
and everything merged cleanly.

Original comment by tim.mil...@gmail.com on 17 Feb 2013 at 1:29

GoogleCodeExporter commented 9 years ago
Ok, I was able to pick out the cleartk-ml-tksvmlight bits from there and merge 
them with ClearTK. Everything seems to be running now.

Before I can push to the main ClearTK repository, we need you to give us 
permission to include your code in ClearTK. Could you follow the instructions 
here:

https://code.google.com/p/cleartk/wiki/DeveloperFAQ#I_would_like_to_contribute_t
o_ClearTK.__What_now?

Original comment by steven.b...@gmail.com on 11 Mar 2013 at 4:29

GoogleCodeExporter commented 9 years ago
Ok, I committed changes based on your patch in revision 
da8693d91c75b2f752516ee76c8409525166a48b. They're not identical to your patch 
because some of the tests wouldn't pass with your code as it was. (For example, 
tksvmlight doesn't seem to produce a "mu" parameter for me.)

Thanks again, and please do open a new ticket if you have more TK-svmlight 
changes you'd like to push to the main ClearTK repo.

Original comment by steven.b...@gmail.com on 12 Mar 2013 at 3:37