blugelabs / bluge

indexing library for Go
Apache License 2.0
1.9k stars 124 forks source link

Export fields in BaseSearch? #37

Open michaeljs1990 opened 4 years ago

michaeljs1990 commented 4 years ago

BaseSearch seems like it would be nice to have its fields exported so other packages adding searchers can use this to build off since it implements a lot of the interface boilerplate. Any reason to hide BaseSearch fields from other packages?

mschoch commented 4 years ago

I don't think there is any hard objection here. The BaseSearch is a new construct, as Bleve only supported TopN functionality, and Bluge had as a goal supporting AllMatches functionality which returned results accessible via the same API.

You mentioned "packages adding searchers", so I just want to clarify (or possibly improve) some terminology. I tend to use the word Searcher to describe something implementing the search.Searcher interface.

Making change you propose would aid in adding something other than TopN or AllMatches, I'm not sure what to best name that.

Can I ask what extension you have in mind here?

michaeljs1990 commented 4 years ago

Can I ask what extension you have in mind here?

For sure! Some of what I am doing may be misguided so i'll try and give a full background here of what I have been up to 😆 . I originally starting with using bleve in my application and wanted to add a custom query language that felt more familiar for people who had been using a solr derived query language in the past. When implementing the above for bleve I noticed that search request allowed you limit return fields which was very beneficial since many documents are quite large and almost always only 4 or 5 fields are actually desired. This was about limiting the response size we return to the user over the wire since this can really add up in the case of a large struct where it's normal to request a thousand or more objects (I was somewhat abusing this) and not for any perceived performance gained by limiting what bleve was returning.

I didn't see anything in the bluge API that let you limit specifically by fields which makes sense based on https://github.com/blugelabs/bluge/issues/36. So I went down the path of adding a Searcher that would do the filtering for you that way end users can easily control return results without needing to use a golang client or RPC call where you send a query and additionally specify the fields that you want. But looking at this now the place I thought you might be able to do this was in the iterator but I think that is just checking if the fields exist. It looks like my best course is to have my new language return the query produced and a slice of the desired fields to the user of the library and let them decide to implement filtering or not when fetching document values.