go-ego / riot

Go Open Source, Distributed, Simple and efficient Search Engine; Warning: This is V1 and beta version, because of big memory consume, and the V2 will be rewrite all code.
Apache License 2.0
6.11k stars 474 forks source link

File search example please #120

Open suntong opened 3 years ago

suntong commented 3 years ago

Description

All example that I saw are string-based searches. However, can riot somehow be used as/for a file-based search tool?

Basically it'll be just like grep, but using its persistent index to speed up the searches, while supporting 中文分词 at the same time.

The application scenario is that I have a huge collection of files in Chinese, thousands of them, thus I need something to search through them quickly, with the help of the pre-built indexes, as the content of the files will not be change (or very rarely), but more and more files are added daily. I haven't found any tools that does a good job in Chinese content search yet.

Is it possible? if so, sample code appreciated.

Thanks

suntong commented 3 years ago

Let's build a Full-Text Search engine
https://artem.krylysov.com/blog/2020/07/28/lets-build-a-full-text-search-engine/

This is the kind of tools that I'm talking about. However, because it is English based, the Inverted Index it builds is not capable of handling Chinese

gedw99 commented 3 years ago

Maybe bleve is better for the use case

suntong commented 3 years ago

Indeed, this is what I'm currently working on

https://github.com/suntong/doc-search

and I'm almost finished (lacks the Chinese search yet).