Closed MaditaVerena closed 3 years ago
on the server? or locally? (if so, which branch?)
Server! Sorry for being vague. :)
no problem! How pressingly do you need it fixed? Just in terms of deciding my priorities...
Not super urgent. Maybe I can help fixing it in some weeks because I sense it might be my code, which is not working properly.
Feel free! :)
On the server we have the stable
branch running. If you can track down the issue, that's all the better
As far as I can tell from the logs, it is getting all 25 shelfmarks per page and going over all 56 pages of results. I am assuming that there a a lot of multiple hits (hits for each language are listed individually). At first glance I see double hits for almost every signature. These are being "flattened" already before you clean the data manually. So basically.. its.. a feature? :D Should we include a small infobox with the result that makes this transparent? Could contain No. of total hits, no. of hits for each language and no. of hits dropped for that reason.
Hmmmm, this kept bugging me, so I investigated further. Turns out, my previous hunch was completely wrong and I misread the logs. Long story short: Issue is with handrit. As you can see in the image below, our function only "reads" the result sub-pages visible in the list at the top of the result pages. All result pages in between are omitted. To get these, we would have to call the `get_serach_result_pages()´ on every x result pages.
I can try and make a fix for this and see if we can backport it to stable.
Will roll out fix to stable and server in a day or two.
I tried to process a search result of 1380 manuscripts (manuscripts, dated 1400-1600: https://handrit.is/is/search/results/FNw6MP ) for their contents but I got only 540 manuscripts (before cleaning!). Same happened while processing the search result for its meta data. Then I got only 304 manuscripts. Mysteriös! No error message was displayed.