Open tombrainbox opened 2 years ago
That seems like a bug, thank you for catching it, @tombrainbox!
This bug happens due to a change in email HTML template for specific cases that includes "Showing less relevant results because there are no great results". I was able to find such emails (only 7 out of ~2k of 'all results' in my case) and reproduce the failure.
For such a template seem to include an extra "hidden" paper in it 🤯 , a duplicate of the first one, that for some obscure reason our XPath library is not able to match //h3/a/@href
agains :/ which leads to an error https://github.com/bzz/scholar-alert-digest/blob/7d2e4de957edf2864360a95b579fc919e9fd561f/papers/papers.go#L137
that results in skipping the whole email's content from the aggregation.
This is wired since XPath browser extension (and default search in Chromium) for the same expressions both returns the right number of titles and urls! So, most probably, this has to do with the logic in https://github.com/antchfx/htmlquery 😕 and a fix would require us to introduce some unit-tests that would first reproduce it precisely \wo touching GMail API (example).
I've noticed that any scholar alert emails that have been configured with 'all results' rather than 'most relevant' result in an error when processed by this tool. This might because each email starts with:
"Showing less relevant results because there are no great results
Update alert to receive fewer, more relevant results"
Am I correct in this, and if so would this be an easy fix to implement? Here is my code (note this happens in json/html or with just minimal flags):