esmero / format_strawberryfield

Set of Display formatters, extractors and utils to make Strawberry field data glow
GNU Lesser General Public License v3.0
6 stars 9 forks source link

Bookreader Navigation of search results within Creative Work Series objects has incorrect sequence #470

Open patdunlavey opened 4 days ago

patdunlavey commented 4 days ago

We have a object type that we call "CWSBook", which is basically a Creative Work Series (digital object collection) object with a series of "Page" objects as children that have the individual page image files. We are able to display this in IA Bookreader using the v2.1 IIIF Manifest. OCR is indexed for these pages with both sequence_id and parent_sequence_id - where the latter records the order of the Pages in the CWSBook.

The problem we are having is that when a search is performed in IAB, the search results are returned sorted by relevance, and not by the page sequence, and this results in the previous/next links in the search results area to jump the user seemingly wildly back and forth.

This is because the sequence of the pages being displayed in IAB corresponds to (comes from?) the parent_sequence_id value indexed in solr, not the sequence_id, whereas \Drupal\strawberryfield\Controller\StrawberryfieldFlavorDatasourceSearchController::flavorfromSolrIndex sorts only by sequence_id, when present.

I'm able to force the correct sequence in the results and correct IAB behavior in my testing with this simple change

      if (isset($allfields_translated_to_solr['parent_sequence_id'])) {
        $fields_to_retrieve['parent_sequence_id'] = $allfields_translated_to_solr['parent_sequence_id'];
        // Creative Work Series need to first be sorted by parent_sequence_id.
        $query->sort('parent_sequence_id', QueryInterface::SORT_ASC);
      }

However, I'm not sure if there are cases where this flavorfromSolrIndex method may be called where this sort is not the correct behavior. I also wonder if sorting by sequence_id - the current behavior - is always correct? The flavorfromSolrIndex method seems to be intended to be used pretty broadly. Are there cases where only sort by relevance is correct?

patdunlavey commented 4 days ago

From @DiegoPino on slack: Diego Pino 4:39 PM So there is one issue (why i avoided it...)

if (isset($allfields_translated_to_solr['parent_sequence_id']) &&
  isset($extradata_from_item['search_api_solr_document'][$allfields_translated_to_solr['parent_sequence_id']]) && !$result_is_top_media) {
  $sequence_number = (array) $extradata_from_item['search_api_solr_document'][$allfields_translated_to_solr['parent_sequence_id']];
  if (isset($sequence_number[0]) && !empty($sequence_number[0]) && ($sequence_number[0] != 0)) {
    // We do all this checks to avoid adding a strange offset e.g a collection instead of a CWS
    $page_number_by_id[$extradata_from_item['search_api_solr_document']['id']] = $sequence_number[0];
  }
}
patdunlavey commented 3 days ago

@DiegoPino - in https://github.com/esmero/strawberryfield/blob/1.5.0/src/Controller/StrawberryfieldFlavorDatasourceSearchController.php#L112-L122 we have...

    //search for IAB highlight
    // if format=json, page=all and q not null
    if (($input = $request->query->get('q')) && ($format == 'json') && ($page == 'all')) {

      $snippets = $this->flavorfromSolrIndex(
        $input,
        $node->id(),
        $processor,
        $fileuuid,
        $indexes
      );

Do I understand the "search for IAB highlight" comment to mean that the conditions there (non-empty 'q' param, result format = 'json', page = 'all') indicate that we are searching for and intend to return ocr highlights? If so (and assuming the if condition is valid for this purpose), then can we use this as the test to determine if the search results should be sorted by sequence ids - both sequence_id and parent_sequence_id?