bellingcat / EDGAR

Tool for the retrieval of corporate and financial data from the SEC
https://colab.research.google.com/github/bellingcat/EDGAR/blob/main/notebook/Bellingcat_EDGAR_Tool.ipynb
GNU General Public License v3.0
95 stars 12 forks source link

An error is thrown when no results are found #14

Closed GalenReich closed 3 months ago

GalenReich commented 4 months ago

Currently if a query returns no results an Error is thrown

src.browser.PageCheckFailedError: Page check failed, page load seems to have failed

It would be better if this was avoided or handled and an easy to interpret message was given to the user.

To reproduce run:

python main.py text_search Bellingcat --start_date "2023-01-01" --exact_search

wenlambdar commented 4 months ago

Will try to tackle that later this week. Just to clarify, we have two cases where this can occur:

Both failures are currently detected when we are not able to find the the table of results on the page. This seemed to work well for cases when a page load fails - e.g. occasional bugs from the SEC website we mentioned in the refactoring PR.

However I believe the first case could likely be handled in a more reliable way by detecting when the "No results found for this search" message is displayed.

Regarding passing a more meaningful message, I can update the browser file to enable passing a custom error message when the page checks are failing, instead of this generic one.

Would these changes seem good to you for this issue ? First one's a bit out of scope but likely a low-hanging fruit.

GalenReich commented 4 months ago

Definitely, that sounds great - I don't want you to feel obliged to pick all of this up though!

I think the most common scenario will be no search results, so showing a short message to the user that doesn't 'look scary' would be appropriate.

As for the other page checks, these will happen less frequently, but still would be good to communicate.

wenlambdar commented 4 months ago

Updated the logging in this PR https://github.com/bellingcat/EDGAR/pull/15/files (didn't touch the results detection mechanic there though), also made some improvements in logging here and there to make it clearer. 🚀