AnantLabs / dkpro-tc

Automatically exported from code.google.com/p/dkpro-tc
Other
0 stars 0 forks source link

Add adapter for SVMhmm #190

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Request: add adapter to SVMhmm for sequence labeling

SVMhmm - Sequence Tagging with Structural Support Vector Machines, 
http://www.cs.cornell.edu/people/tj/svm_light/svm_hmm.html

licence: "The program is free for scientific use."

Original issue reported on code.google.com by ivan.hab...@gmail.com on 9 Oct 2014 at 12:57

GoogleCodeExporter commented 9 years ago

Original comment by ivan.hab...@gmail.com on 9 Oct 2014 at 1:03

GoogleCodeExporter commented 9 years ago
Updated by r1141

Original comment by ivan.hab...@gmail.com on 10 Oct 2014 at 8:12

GoogleCodeExporter commented 9 years ago
Updated by revision r1142 

Fixed parsing of featureVector files, capturing output of SVMhmm binaries, 
better logging using Apache Commons Logging.

Original comment by ivan.hab...@gmail.com on 10 Oct 2014 at 8:17

GoogleCodeExporter commented 9 years ago
Updated by

r1148

Parametrization of SVMhmm + example code

r1149

Moving demo to the appropriate package; removing irrelevant files

r1150

Adding license/permission to distribute binaries.

Original comment by ivan.hab...@gmail.com on 10 Oct 2014 at 2:51

GoogleCodeExporter commented 9 years ago
TODO: Writing feature vectors should ignore features with 0 value

Original comment by ivan.hab...@gmail.com on 10 Oct 2014 at 2:52

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1167.

Omitting zero-valued features.

Original comment by ivan.hab...@gmail.com on 15 Oct 2014 at 2:45

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1169.

After this comment ( https://code.google.com/p/dkpro-tc/source/detail?r=1167 ), 
null-valued features are also possible and thus ignored.

Original comment by ivan.hab...@gmail.com on 16 Oct 2014 at 7:03

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1177.

Logging SVM output to log.

Original comment by ivan.hab...@gmail.com on 16 Oct 2014 at 1:19

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1178.

Omitting unnecessary featureId to feature name mapping; fast access to sparse 
features

Original comment by ivan.hab...@gmail.com on 20 Oct 2014 at 8:29

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1180.

Warning when non-number features are used

Original comment by ivan.hab...@gmail.com on 21 Oct 2014 at 6:47

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1186.

Fixing multi-line comments

Original comment by ivan.hab...@gmail.com on 27 Oct 2014 at 8:27

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1188.

Adding random classifier for sequence labeling; allowing further meta-data 
features to be stored in feature vectors files in SVMhmm

Original comment by ivan.hab...@gmail.com on 27 Oct 2014 at 11:38

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1193.

CV evaluation fails if no test tasks are found; some other minor changes

Original comment by ivan.hab...@gmail.com on 27 Oct 2014 at 2:18

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1194.

Refactoring batch CV report

Original comment by ivan.hab...@gmail.com on 27 Oct 2014 at 2:50

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1195.

Fixing bug in double-valued features

Original comment by ivan.hab...@gmail.com on 28 Oct 2014 at 11:32

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1218.

Reporting gold classes distribution

Original comment by ivan.hab...@gmail.com on 6 Nov 2014 at 8:01

GoogleCodeExporter commented 9 years ago
The svm_hmm compiles fine on OS X, but...

There appears to be a general problem invoking the classifier with long paths. 
Apparrently, the internal filename buffers used by the program are only 200 
characters long. I have paths on my system that are quite a bit longer than 
that, which causes the binaries to fail with like "file not found" errors.

I tried changing some [200]'s into [4096]'s and recompiled, but that didn't 
seem to help. This needs some more investigation - and it is independent of the 
OS! It is also a problem on other OSes.

Original comment by richard.eckart on 11 Nov 2014 at 4:24

GoogleCodeExporter commented 9 years ago
A workaround for the long-path issue would be to copy the training files to the 
same temporary directory that contains the executable and to use relative 
names. When running the executable, the working directory must be set to that 
temporary directory. The output would be written also to the temporary 
directory and later copied to the task execution context. It's a little extra 
work, but it shouldn't slow things down too much.

Original comment by richard.eckart on 13 Nov 2014 at 9:22

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1230.

Workaround for bug in svm_hmm (file path max. 200 chars) by copying data to 
temp files; extending example with train-test scenario

Original comment by ivan.hab...@gmail.com on 24 Nov 2014 at 10:43

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r1231.

Workaround for bug in svm_hmm (file path max. 200 chars) by copying data to 
temp files - another fix

Original comment by ivan.hab...@gmail.com on 24 Nov 2014 at 11:09

GoogleCodeExporter commented 9 years ago
Is there anything left to do here?

Original comment by daxenber...@gmail.com on 11 Dec 2014 at 3:47

GoogleCodeExporter commented 9 years ago
I think it's done for now.

Original comment by ivan.hab...@gmail.com on 12 Dec 2014 at 9:38