barzerman / barzer

barzer engine code
MIT License
2 stars 0 forks source link

AUTOKW: create automated keyword extraction process/script from phrases #630

Closed barzerman closed 10 years ago

barzerman commented 11 years ago

It should be possible to automate creation of most of the keywords from the phrase file

0xd34df00d commented 11 years ago

We did this like a long ago, with frequency analysis, haven't we?

barzerman commented 11 years ago

im not sure we have finished it . it's different for Zurgle. in this case we will have a relatively small corpus (hundreds of documents each only a few hundred bytes at most)

this keyword creation will need to be more heuristical in and less statistical . it is related to the ngram analysis certainly . wheres the stuff we did?

0xd34df00d commented 11 years ago

Uhm, I'm not sure where it's now. We did it like in Feb or March, if not earlier, though you're right — we hardly ever used it, so it's very likely unfinished.

Dunno what heuristics are you talking about.

barzerman commented 11 years ago

for example from this kind of corpus http://eu.barzer.net/~yanis/gloss_names.txt

barzerman commented 10 years ago
  1. this should be a blackbox batch task, which takes a phrasefile (standard zurch phrase file)
  2. at first it can ignore the phrase source (that is title/content/etc.) and assume that they're all from the title
  3. the blackbock must output entity upload format with zyntax ID|pattern

ID should have the following format: SEQNO.keyword seqno is sequential number within the same user

The blackbox should be able to: 1) generate keywords from a phrase file 2) generate additional keywords from existing keywords and new phrases

barzerman commented 10 years ago

https://github.com/barzerman/barzer/blob/issue_638_subtractor/zurch_pipeline/keyword_generator.py