Open jiangfeng1124 opened 10 years ago
Hi Jiang,
You'll need to compute the words you want to use first and then use the --token-filter option to restrict which words are retained.
Also, please use the mailing list for these types of questions, rather than opening a new issue on Github for each question. The mailing list helps others see the answers in case they have the same question.
Thanks, David
On Wed, Apr 2, 2014 at 11:14 AM, jiangfeng notifications@github.com wrote:
Dear developers,
I did not find an option to limit the vocabulary. For example, I don't want to learn representations for words which occurs less than 50 in my corpus. The reason is that if I use all the words (or exclude the stop words), the vocabulary will be very large, which is undesired.
I am wondering whether there is a convenient way for doing this? Thanks very much, Jiang
Reply to this email directly or view it on GitHubhttps://github.com/fozziethebeat/S-Space/issues/51 .
I see, thanks very much!
Jiang
Dear developers,
I did not find an option to limit the vocabulary. For example, I don't want to learn representations for words which occurs less than 50 in my corpus. The reason is that if I use all the words (or exclude the stop words), the vocabulary will be very large, which is undesired.
I am wondering whether there is a convenient way for doing this? Thanks very much, Jiang