esmero / strawberryfield

A Field of strawberries
GNU Lesser General Public License v3.0
10 stars 5 forks source link

Add more fetch options to StrawberryfieldFlavorDatasourceSearchController #234

Open DiegoPino opened 2 years ago

DiegoPino commented 2 years ago

What?

This is a feature request from slack but the need has been discussed a few times already so time to implement this:

From @patdunlavey:

For reasons that I have no control over (legacy project being migrated from Islandora 7), we are doing "Books" as collection-type objects, with multiple "Page" objects (instead of as single digital objects with PDF or multiple image files). I have the book pages displaying in the IABookReader nicely, and now it's time to get in-book searching working. I can OCR the image files that are attached to the Page objects. However the \Drupal\strawberryfield\Controller\StrawberryfieldFlavorDatasourceSearchController::search() method that is used to perform the IAB highlighted text search, works at the level of the ocr'd file's parent node. i.e. the Page, in this case. Not at the level of the Book, which is the collection object that the Page is_part_of. So my feature request, assuming that I'm not missing some alternative method of satisfying this requirement, is possibly to have this search method look to see if the object being searched has members or parts, and if so, to drill down into the children. Does that make any sense?

Solution is hidden in the code and needs to be exposed. We have full control of how SBF Documents are connected to a parent Object so we need to allow also an extra hop into the parent-parent. The larger issue is always the fact that a Manifest can request pages in any order so I might also have to tap into the API we build here to allow alternatively a "smart option" to know what FILE is being seeing as Sequence X. There are a few alternatives (discussed before but can't find the where/when) where we do a double rendering of the Manifest. Basically the Search Also renders the manifest and thus knows based exactly what is being seeing right now as Page 1, Page 2, Page 3 by the viewer. Since Manifests are cached that is basically instantly. Means also passing from the Viewer to the controller the URL of the manifest to make all easier.

We might need to have a configuration Form for this endpoint, so should go into the "Important IIIF settings" probably.