dileepajayakody / semanticvectors

Automatically exported from code.google.com/p/semanticvectors
Other
1 stars 0 forks source link

filtering words to be indexed??? #72

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. How to index only those words that occur >= 2 times and has <=3 non-alphabet 
characters in our whole collection of documents?
2.
3.

What is the expected output? What do you see instead?
Index only the words as described above...

What version of the product are you using? On what operating system?
I am using "semanticvectors-4.0 "  on ubuntu 12.04

Please provide any additional information below.

Original issue reported on code.google.com by rohitdee...@gmail.com on 17 Oct 2013 at 6:38

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Should be -minfrequency 2 -maxnonalphabetchars 3

Does that work for you?

Original comment by dwidd...@gmail.com on 19 Oct 2013 at 4:02

GoogleCodeExporter commented 9 years ago

Original comment by dwidd...@gmail.com on 9 Feb 2015 at 9:47