Closed GalenReich closed 3 months ago
Will try to tackle that later this week. Just to clarify, we have two cases where this can occur:
Both failures are currently detected when we are not able to find the the table of results on the page. This seemed to work well for cases when a page load fails - e.g. occasional bugs from the SEC website we mentioned in the refactoring PR.
However I believe the first case could likely be handled in a more reliable way by detecting when the "No results found for this search" message is displayed.
Regarding passing a more meaningful message, I can update the browser file to enable passing a custom error message when the page checks are failing, instead of this generic one.
Would these changes seem good to you for this issue ? First one's a bit out of scope but likely a low-hanging fruit.
Definitely, that sounds great - I don't want you to feel obliged to pick all of this up though!
I think the most common scenario will be no search results, so showing a short message to the user that doesn't 'look scary' would be appropriate.
As for the other page checks, these will happen less frequently, but still would be good to communicate.
Updated the logging in this PR https://github.com/bellingcat/EDGAR/pull/15/files (didn't touch the results detection mechanic there though), also made some improvements in logging here and there to make it clearer. 🚀
Currently if a query returns no results an Error is thrown
src.browser.PageCheckFailedError: Page check failed, page load seems to have failed
It would be better if this was avoided or handled and an easy to interpret message was given to the user.
To reproduce run:
python main.py text_search Bellingcat --start_date "2023-01-01" --exact_search