at15 / forum-search

craw, store and index data for later search
MIT License
1 stars 0 forks source link

refactor tokenizer module #24

Closed at15 closed 8 years ago

at15 commented 8 years ago
at15 commented 8 years ago

HanLP already provided a module for tokenize, but it use standard tokenizer for keyword extractor

at15 commented 8 years ago

it's ok to store the index in a whole file now, time to split it and query against it.

at15 commented 8 years ago

e... have to say .... use json make the index file really big .... 3mb -> 70mb