fozziethebeat / S-Space

The S-Space repsitory, from the AIrhead-Research group
GNU General Public License v2.0
203 stars 106 forks source link

Can I limit the vocabulary? #51

Open jiangfeng1124 opened 10 years ago

jiangfeng1124 commented 10 years ago

Dear developers,

I did not find an option to limit the vocabulary. For example, I don't want to learn representations for words which occurs less than 50 in my corpus. The reason is that if I use all the words (or exclude the stop words), the vocabulary will be very large, which is undesired.

I am wondering whether there is a convenient way for doing this? Thanks very much, Jiang

davidjurgens commented 10 years ago

Hi Jiang,

You'll need to compute the words you want to use first and then use the --token-filter option to restrict which words are retained.

Also, please use the mailing list for these types of questions, rather than opening a new issue on Github for each question. The mailing list helps others see the answers in case they have the same question.

Thanks, David

On Wed, Apr 2, 2014 at 11:14 AM, jiangfeng notifications@github.com wrote:

Dear developers,

I did not find an option to limit the vocabulary. For example, I don't want to learn representations for words which occurs less than 50 in my corpus. The reason is that if I use all the words (or exclude the stop words), the vocabulary will be very large, which is undesired.

I am wondering whether there is a convenient way for doing this? Thanks very much, Jiang

Reply to this email directly or view it on GitHubhttps://github.com/fozziethebeat/S-Space/issues/51 .

jiangfeng1124 commented 10 years ago

I see, thanks very much!

Jiang