Make flavorsearch limit configurable

patdunlavey commented 8 months ago

The OCR Search Controller search function limits the number of results to 100 here. That can be confusing when searching in a Book with potentially hundreds of matches. For example, in this 345 page book on dogs searching on the word "dog" only yields results through page 198.

Two challenges I can think of to this request:

There doesn't seem to be a logical configuration page for this. From a UX standpoint, I would think the strawberry_paged_formatter configuration form might make sense. But the OCR search controller is part of the strawberryfield module, not format_strawberryfield. This makes me wonder if the better approach would be for the strawberryfield module to expose a hook or event that lets somebody else alter the limit. But that seems like huge overhead for a pretty trivial feature.
If the limit is defined in the OCR search controller, then any flavorsearch will inherit this limit. What makes sense for an ocr search may not make any sense for some other hypothetical kind of flavorsearch. It also may not make sense for all ocr searches. For example, a search of a single page vs. an entire book. I'm curious if that prompts any thoughts @DiegoPino ?

DiegoPino commented 8 months ago

@patdunlavey my only fear is that making this a "per formatter" option would mean people forgetting/overriding a few/not all and eventually getting annoyed. The controller is used by IABookreader primarily but not limited to it. If a best case scenario is desired then

One global
One per Formatter setting (which would pass the number/with a cap/and always capped too as an URL argument.. we don't want anyone trying to break your server by playing with arguments right?)
An event? or a hook? If someone needs to get creative?

What are your thoughts @patdunlavey ?

patdunlavey commented 8 months ago

I was thinking about this last night. Presumably the limit value of 100 was chosen to ensure we don't overload or crash something. I'm sure that's an arbitrary number, as will be any limit number. I'm not sure I see a use case in which it makes compelling sense for one site to use a limit of 100 and another to use a limit of 500.

We can argue about what the value should be: 100 seems pretty limiting to me. Is there a reason not to just make it 500, for example?

But this got me to thinking about what is the actual problem here? Assuming that there must be some limit, and that some times a flavorsearch will exceed it, perhaps the question is, how do we mitigate the negative user experience that can come from hitting that limit, regardless of what the actual limit is (only being able to find half the "dogs" in that dog book, for example)? It occurs to me that one thing we can do is simply inform the user when the limit, whatever it is, has been exceeded.

Any solr query produces a count of the number of items found, as well as the actual results list. The results list is constrained by the offset and limit values, but the count is not. We could put that count into the json response, and then do something in our bookreader javascript to have it flash a message, like "225 matches found, displaying the first 100" in the same way that it flashes a message for "no matches found". (I have no idea how difficult that would be!)

Back to you for your thoughts @DiegoPino !

DiegoPino commented 8 months ago

@patdunlavey those are good reflections. Since Solr has no "give me all" as a server, basically every query has a limit, I am willing to explore some way of showing to the user the fact there are more hits that the server established one. It does require a refactor that (as with IA Bookreader) I am always a bit scared to do. Also, there are chances that a very high limit (if very accurate when searching for dog) might simply timeout via JS (payload too large)

Give me the day to explore a few options. I will come back with some examples/code ideas by 5PM. Thanks again

patdunlavey commented 8 months ago

Thank you! There's no rush on my end - this is a pretty low priority.

DiegoPino commented 8 months ago

@patdunlavey ok. The solution that is working for me now is the following

A Global Limit setting (in this module) going into the IIIF settings form (same we use for IIIF Content Search too)
A per Formatter limit setting (which when not set will use the global/and reported as such in the Formatter settings)
A change in the route/controller arguments and internal functions to respect the limit
A still default limit of 100 (via a hook/set initially but also on code if someone calls the route without the limit.. routes can not use defaults based on configs so we go for that. EXCEPT (question) we make the limit not a route argument but a query argument (?limit=) instead of /{limit} (concerns?)
An IAbookreader JS/hack to A) enforce the limit + also report (This is tricky) on the bottom results that what is shown is only a small subset ....

Now to actual handling/future. If a book is about Dogs, and you search Dogs... chances are what you will see is like 10000 dog mentioned. Future task could be, instead of returning ALL dogs sort by page ...(yeah, I know people want that) we sort by score/then by page ... that way the returns will be limited by a number but really really only the higher density/more frequent appearances of Dog will have priority?

patdunlavey commented 8 months ago

Thanks for all this @DiegoPino ! I'm not sure the effort is justified, but I love tit!

I don't feel I know enough to have an opinion as to which approach is better for providing an optional limit value: via a route parameter, or as a query argument. It occurs to me that an offset, page number (in the case of multiple-page search), or both would complement this, if we anticipate a need to be able to make ajax calls to load search results incrementally.

The sort by relevance question is interesting. I'm not sure how intuitive it would be in a viewing context that is very much based on moving forward and backward through pages. As a user, I think I would want to be able to toggle between the two modes. Do we really want to get into that degree of complexity in the case of the IAB? In flavorsearch, I can see the sort option also being selectable in the url, and a default sort able to be configured. I'm just not sure implementing them would be warranted if IAB is the only use-case. Perhaps a better use case can be found in Mirador or other viewer? Given the time/effort that would be needed to make use of these features, does it make sense to develop them?

esmero / strawberryfield

Make flavorsearch limit configurable #317