OregonDigital / oregondigital

OregonDigital Hydra Application
https://oregondigital.org/catalog/
Other
25 stars 5 forks source link

Document Search Slow - Text #804

Open kestlund opened 9 years ago

kestlund commented 9 years ago

Within document viewer search is very slow and it is hard to tell if there is even text behind to search. Problem record: http://oregondigital.org/catalog/oregondigital:df661w60r

If there is text, can we look into the slowness? If not text, is there a way to notify the user that search is hopeless?

tpendragon commented 9 years ago

Yup, documents with lots of text are going to be slow because it's searching via ruby. I've talked to @atz about how we could move it into solr, but so far I've been unable to get something figured out. The problem is we need phrase searching on a tagged document, where it should ignore the tags for search but return a result WITH the tags so that we can get the x/y coordinates on the page from it.

srabun commented 9 years ago

may get more info from IIIF work? maybe PDF JS as long as it meets our needs? @kestlund recommends tabling to F4/Hydra x

jechols commented 9 years ago

I think pdj.js can satisfy some of our requirements, maybe all of them if we're willing to get our hands slightly dirty. For instance, here's a solution to do a search and auto-highlight-all:

PDFFindBar.open(); //optional if you want to show the search bar
PDFFindBar.findField.value = 'your search term';
PDFFindBar.highlightAll.checked= true;
PDFFindBar.findNextButton.click();

(More details on search/page from URL: https://github.com/mozilla/pdf.js/issues/1875)