Open CMajeri opened 1 year ago
@CMajeri here're a few details that should help answering your questions ..
Like with any other information retrieval library, you'd benefit by moving the bulk of work to index time over query time. This'd obviously come with the trade-off that your indexing time will likely increase and possibly it's footprint too.
A wildcard query works in 2 steps ..
So a wildcard query could potentially run sub par owing to the number of candidates that become eligible, and the disjunction query itself.
Here's a recommendation for your situation - configure a "custom" analyzer where you ..
"Cow Palace Scotch Ale" => {"cow", "pal", "ala", "lac", "ace", "sco", "cot", "otc", "tch", "ale"}
Now, your search request would just be a term query, which would perform a lot faster than the wildcard query.
{"query": {"field": "name_tri", "term": "ale"}}
Here's code that'll build this analyzer for you ..
func buildIndexMapping() (*bleve.IndexMapping, error) {
indexMapping := bleve.NewIndexMapping()
var err error
err = indexMapping.AddCustomTokenFilter("ngram_min_3_max_3",
map[string]interface{}{
"min": 3,
"max": 3,
"type": `ngram`,
})
if err != nil {
return nil, err
}
err = indexMapping.AddCustomAnalyzer("custom_ngram",
map[string]interface{}{
"type": `custom`,
"char_filters": []interface{}{},
"tokenizer": `unicode`,
"token_filters": []interface{}{
`to_lower`,
`ngram_min_3_max_3`,
},
})
if err != nil {
return nil, err
}
return indexMapping, nil
}
Here's a blog highlighting how text analysis works within bleve .. https://www.couchbase.com/blog/full-text_search_text_analysis/
Sorting and pagination is applied once all results are collected, over the heap.
size
(limit) and from
(offset) attributes in your search request. This will return predictive results only if the results are sorted.Although bleve indexes are disk-bound, fetched indexed content is mmap-ed, which'll support faster access on subsequent queries to the same content.
Hello, I'm new to bleve and trying to use it to perform simple substring search, i.e. the equivalent of the sql query
LIKE '%<some_word>%'
. A typical way to achieve this is through the use of trigrams, where we match all entries that contain all trigrams, and then follow that up with a second filter operation. I tried to replicate this in thebeer-search
context, and came up with this:where "name_tri" is a text mapping, using a trigram as a token filter. This works perfectly, and doesn't print any rows. For comparison, without the wildcard query, this prints:
which is expected.
However, I'm unfamiliar with how bleve performs its indexing, and was wondering how well it combines those filters. Will the presence of the wildcard completely negate the benefits of filtering through trigrams? In general, how does bleve combine filters, is there any documentation on the subject besides the code? I'd also be particularly interested in knowing how sorting works, espectially in the context of paginating results (i.e. running the same query many times with different limits and different offsets).
Thanks.