Open Prog19 opened 8 years ago
Thanks Pragati,
I feel the "right thing to do" is to implement it in Medallia's Java implementation, and then add a wrapper in this project for it. Compiling the C code is a good solution for "just making it work". However, it is likely to fail as there are a multitude of C compilers, which may give different errors, and a C compiler may be missing in the first place.
Agreed, Kiran! This sure is a dirty fix.
A quick solution to this issue from the Java implementation would be downloading this code file (from the original C tool) and compiling, and executing it from Clojure. This marks the multi-word phrases with an underscore in between in the training text corpus. (Refer 'From words to phrases and beyond' from here)
Below is the code to run the executable in
/resources
in the project directory using Java Runtime instance and alternatively, by shelling out in Clojure. Here, the input is placed in/resources/train.txt
, the output may be found at/resources/output/out.txt
and the other parameters to the word2phrase training take default values.