bellingcat / EDGAR

Tool for the retrieval of corporate and financial data from the SEC
https://colab.research.google.com/github/bellingcat/EDGAR/blob/main/notebook/Bellingcat_EDGAR_Tool.ipynb
GNU General Public License v3.0
128 stars 15 forks source link

Support exact search with quotes in the notebook #33

Closed GalenReich closed 3 months ago

GalenReich commented 3 months ago

Previously, a search for "Volcano risk" in the notebook would return results matching Volcano and risk separately ❌

Now, a search for "Volcano risk" in the notebook returns results matching Volcano risk exactly ✅

Previously, the notebook used the CLI by invoking !edgar-tool text_search {search arguments}, but this caused problems when doing exact searches because of the behaviour of escape characters (see #24) and Colab string interpolation for the CLI arguments.

This PR reworks the notebook to import and use the SecEdgarScraperCli object in Python directly, which avoids the string interpolation problems when passing quotes for exact searches.

As part of the notebook refactor, there was also a remaining call to sys.exit in the CLI, that caused problems for the colab interface. It was missed in #27 (that closed #17). This PR also cleans up the final sys.exit call and lets the SecEdgarScraperCli object throw the exception directly (to be handled by the implementer).