Open patdunlavey opened 1 month ago
From @DiegoPino on slack: Diego Pino 4:39 PM So there is one issue (why i avoided it...)
if (isset($allfields_translated_to_solr['parent_sequence_id']) &&
isset($extradata_from_item['search_api_solr_document'][$allfields_translated_to_solr['parent_sequence_id']]) && !$result_is_top_media) {
$sequence_number = (array) $extradata_from_item['search_api_solr_document'][$allfields_translated_to_solr['parent_sequence_id']];
if (isset($sequence_number[0]) && !empty($sequence_number[0]) && ($sequence_number[0] != 0)) {
// We do all this checks to avoid adding a strange offset e.g a collection instead of a CWS
$page_number_by_id[$extradata_from_item['search_api_solr_document']['id']] = $sequence_number[0];
}
}
@DiegoPino - in https://github.com/esmero/strawberryfield/blob/1.5.0/src/Controller/StrawberryfieldFlavorDatasourceSearchController.php#L112-L122 we have...
//search for IAB highlight
// if format=json, page=all and q not null
if (($input = $request->query->get('q')) && ($format == 'json') && ($page == 'all')) {
$snippets = $this->flavorfromSolrIndex(
$input,
$node->id(),
$processor,
$fileuuid,
$indexes
);
Do I understand the "search for IAB highlight" comment to mean that the conditions there (non-empty 'q' param, result format = 'json', page = 'all') indicate that we are searching for and intend to return ocr highlights? If so (and assuming the if
condition is valid for this purpose), then can we use this as the test to determine if the search results should be sorted by sequence ids - both sequence_id
and parent_sequence_id
?
Hi @DiegoPino, I've ascertained that, for my project, the solution I proposed here is safe/valid: I have no cases where adding sorting by parent_sequence_id
would throw off the desired sorting.
For my purposes, for now, I think I'll just do a composer patch so that I'm not blocked on this.
But I know we need to think more expansively and identify cases where this assumption is not valid. What are the cases where parent_sequence_id
would have a value in a strawberryflavor record where it would invalidate the sorting? How can we programmatically identify those cases, either directly inside ::flavorFromSolrIndex
, or inside ::search
(passing a flag into ::flavorFromSolrIndex
), so that it knows not to sort on the parent sequence id)?
Closely related question: if sorting on parent_sequence_id
is sometimes not correct, is always sorting on sequence_id
correct? Are there times when ::flavorfromSolrIndex
is called when the search result relevance actually is the correct sort?
Finally, do we handle missing values sort correctly for this case (missing values should sort last in an ASC sort, I think - we would not want to sort by parent_sequence_id
and have all the records with no parent_sequence_id
come out first)?
We have a object type that we call "CWSBook", which is basically a Creative Work Series (digital object collection) object with a series of "Page" objects as children that have the individual page image files. We are able to display this in IA Bookreader using the v2.1 IIIF Manifest. OCR is indexed for these pages with both
sequence_id
andparent_sequence_id
- where the latter records the order of the Pages in the CWSBook.The problem we are having is that when a search is performed in IAB, the search results are returned sorted by relevance, and not by the page sequence, and this results in the previous/next links in the search results area to jump the user seemingly wildly back and forth.
This is because the sequence of the pages being displayed in IAB corresponds to (comes from?) the
parent_sequence_id
value indexed in solr, not thesequence_id
, whereas \Drupal\strawberryfield\Controller\StrawberryfieldFlavorDatasourceSearchController::flavorfromSolrIndex sorts only bysequence_id
, when present.I'm able to force the correct sequence in the results and correct IAB behavior in my testing with this simple change
However, I'm not sure if there are cases where this
flavorfromSolrIndex
method may be called where this sort is not the correct behavior. I also wonder if sorting bysequence_id
- the current behavior - is always correct? TheflavorfromSolrIndex
method seems to be intended to be used pretty broadly. Are there cases where only sort by relevance is correct?