bnqtoan / clearnlp

Automatically exported from code.google.com/p/clearnlp
Other
0 stars 0 forks source link

Adding personal file for training tokenizer #8

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
While training the model, you are using a set of input files- abbrevations, 
compund, etc.  
Can we use a set of our own dictionary files for training the model. 
 For example, we have a set of terms from medical or law field and I want to tokenize those terms as a single term. e.g. law maker. 

Can you please suggest the correct process for this. 
What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

Please provide any additional information below.

Original issue reported on code.google.com by nitesh.n...@gmail.com on 7 Oct 2013 at 1:40