Closed DiegoPino closed 1 year ago
@patdunlavey this is the code that solves an 85% of the use cases. Will add a guide with screenshots tomorrow but the code as it is should be ready. The other part goes into format_strawberryfield
since metadata displays are defined there and I can not introduce a bidirectional dependency between modules.
@patdunlavey,@alliomeria and @karomabiles (if you want to see this working) instructions for testing OCR for compounds:
/admin/config/search/search-api/index/default_solr_index/fields
And make sure each one also has "sequence_id" set, to 1, 2 and 3 respectively. If your webforms don't have that element/key (we need it) please add it or edit the JSON RAW. Save.
Make sure the Queue is processed (All Background ones that will generate OCR).
Mine is named:
and has these settings:
Basically you want to have the IABookreader but using the IIIF V3 CWS as template as source. Now Apply that view Mode to the Top Object by editing and forcing that Display Mode
You should not need to reindex at this stage (if you followed this steps for this demo object)
Search for "Queen", "Pumpkin" and "King". Each should be highlighted correctly on its own page. Now search for "OCR" multiple pages.
This covers the basic use case where all children have a sequence_id
and all are shown in the Manifest.
Still working on the complex (a setting) use case where the structure shown is different, maybe only odd
pages, etc.
Please let me know if you have issues/questions/needs
@DiegoPino starting to look at this now (sorry for the delay!!!)
@DiegoPino I was able to reproduce your steps, and your result! The only problem I noticed is that I don't get the pins in the result bar. I suspect that's due to me not being fully caught up to changes in the IIIF Presentation API 3 Creative Works Series Manifest.
I tested what happens when I add a second image file to one of the child objects. It seems to OCR correctly, but it is saved in the key_value table with the sequence number of "1", rather than that found at "as:image".*.sequence
. As a result, when I display in the bookviewer, I get the additional page, but highlighting is off. In this case, I added your sample image file to a page in this object, and though it searches successfully for it (the word "queen" in this example), it highlights on the wrong page:
Not sure if this is a simple problem to solve (and whether it's in the 15% you referred to!).
Looking here, it seems like the sequence number should be correct. Not sure why it isn't!
@patdunlavey adding a new page and having key_value = 1 is OK. I wonder if you added the "sequence_id" JSON KEY key to your new page/ADO?
The actual page matching here depends on having a sequence_id at at Child ADO level. Without it, the Manifest is going to show pages in any order and won't match the response (and re-lative new ordering of results from the search) order that happens here now. The re-paging of the results happens here: https://github.com/esmero/strawberryfield/blob/3d022aeb07a85bd39c477790669ee1254f275fc2/src/Controller/StrawberryfieldFlavorDatasourceSearchController.php#L290 So if your ADO (the one that produced the HOCR) has no sequence_id it will return 1 and thus will offset all. Your new page should have sequence_id = 4 (in the JSON) now
Also, the lack of pins in the result bar is strange. Are you using this on top of a custom code piece? e.g have you started modifying any other part of Archipelago already? Weird because on a fresh 1.0.0 I do see the pins .... maybe we need to have a call!
@patdunlavey will merge and we open a new Pull/ISSUE for troubleshooting? There is more work to be done on SBFlavors for sure and I can add any corrections to a new pull.
Sorry, I meant to get the results of my investigation in earlier! I'll make a new ticket for the multi-file sequencing issue.
Still WIP don't even test