blugelabs / bluge

indexing library for Go
Apache License 2.0
1.88k stars 122 forks source link

Replacement for FieldDictPrefix? #79

Closed warmans closed 2 years ago

warmans commented 2 years ago

Using bleve in order to get unique values of a field using a prefix it was possible to do something like this:

func (s *Search) ListTerms(fieldName string, prefix string) (models.FieldValues, error) {
    dct, err := s.index.FieldDictPrefix(fieldName, []byte(prefix))
    if err != nil {
        return nil, err
    }
    defer dct.Close()

    terms := models.FieldValues{}
    for {
        entry, err := dct.Next()
        if err != nil {
            return nil, err
        }
        if entry == nil {
            break
        }
        terms = append(terms, models.FieldValue{Value: entry.Term, Count: int32(entry.Count)})
    }
    return terms, nil
}

However there doesn't seem to be a similar API for the bluge.Reader . Would this instead now require an aggregation, or is there a more efficient way to fetch unique field values?

If it does require an aggregation, are they any code examples of how the aggregations work exactly?

mschoch commented 2 years ago

The same functionality is available, we've just combined several methods into one, and changed the name.

Previously in Bleve, we offered separate methods to access the data in the term dictionary:

In Bluge, we combined all of these into one method, and made it even more powerful. The new signature is:

DictionaryIterator(field string, automaton segment.Automaton, start, end []byte) (segment.DictionaryIterator, error)

The idea is that in all cases, you just want to see data in the dictionary for a field, you can optionally provide a start/end key, and you also optionally filter using an Automata (internally this is how we regex/fuzzy searching, now it's available externally as well). The start, end, and automata are all optional, in the case you want to visit the entire term dictionary for the field.

To answer your specific question, how does one use this API to see all terms in a field with a given prefix? We can look at how the the prefix search does this. If you look at this snippet here:

https://github.com/blugelabs/bluge/blob/f89eff45771cfe7cb151b01bc6204028f3c06be9/search/searcher/search_term_prefix.go#L25-L30

You'll see we use the prefix itself as the start key, and we compute an end-key that would be the first key not containing the desired prefix. Then we use the new method signature directly.

I would say this change is similar to many changes from Bleve to Bluge. We tried to simplify the library's API, by consolidating and combining similar methods. The new API ends up more powerful, but typically requires a few extra lines in your application to use it correctly.

warmans commented 2 years ago

Great, thanks for the quick and detailed response. I'll implement it as you describe.

Btw, would it be worth adding a migration guide to the docs covering some of the less obvious API changes? Just a thought.

mschoch commented 2 years ago

@warmans I have added a page to the website which focuses on Bleve migration issues and linked to this issue there. It's a bit sparse at the moment, but now we have to place to collect these. https://blugelabs.com/bluge/migration/

NOTE: edited, original link was wrong