Closed barzerman closed 10 years ago
We did this like a long ago, with frequency analysis, haven't we?
im not sure we have finished it . it's different for Zurgle. in this case we will have a relatively small corpus (hundreds of documents each only a few hundred bytes at most)
this keyword creation will need to be more heuristical in and less statistical . it is related to the ngram analysis certainly . wheres the stuff we did?
Uhm, I'm not sure where it's now. We did it like in Feb or March, if not earlier, though you're right — we hardly ever used it, so it's very likely unfinished.
Dunno what heuristics are you talking about.
for example from this kind of corpus http://eu.barzer.net/~yanis/gloss_names.txt
ID should have the following format: SEQNO.keyword seqno is sequential number within the same user
The blackbox should be able to: 1) generate keywords from a phrase file 2) generate additional keywords from existing keywords and new phrases
It should be possible to automate creation of most of the keywords from the phrase file