Adding personal file for training tokenizer

What steps will reproduce the problem?
While training the model, you are using a set of input files- abbrevations, 
compund, etc.  
Can we use a set of our own dictionary files for training the model. 
 For example, we have a set of terms from medical or law field and I want to tokenize those terms as a single term. e.g. law maker. 

Can you please suggest the correct process for this. 
What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

Please provide any additional information below.

Original issue reported on code.google.com by nitesh.n...@gmail.com on 7 Oct 2013 at 1:40

SigmoidFreud / clearnlp

Adding personal file for training tokenizer #8