bellingcat / EDGAR

Tool for the retrieval of corporate and financial data from the SEC
https://colab.research.google.com/github/bellingcat/EDGAR/blob/main/notebook/Bellingcat_EDGAR_Tool.ipynb
GNU General Public License v3.0
141 stars 16 forks source link

Location-based Search #22

Closed GalenReich closed 5 months ago

GalenReich commented 8 months ago

Currently, searching by "Principal executive offices in" and "Incorporated in" are not supported by the tool. These are provided by the SEC tool:

image

They are handled by passing the following fields to the API:

locationType=incorporated
locationCode=AL
locationCodes=AL

If locationType is omitted the biz_states field is searched. If locationType=incorporated the inc_states field is searched.

locationCode doesn't appear to do anything, instead locationCodes (note plural) seems to have the effect. Multiple values can be given and the endpoint appears to return matches for any of the terms.

The TEXT_SEARCH_LOCATIONS_MAPPING object should be used to support this

JackCollins91 commented 6 months ago

Would like to take if still a warm issue.

GalenReich commented 6 months ago

Fantastic @JackCollins91 - thank you! I have assigned you and look forward to the PR 🙌

JackCollins91 commented 6 months ago

Hi @GalenReich I've looked over everything and I think I understand how this PR should be done. However, I'm new to EDGAR so I want to check I have the expected behaviors correct (I'll prep the PR in the mean time).

The following show the new kinds of CLI commands that could be made with this enhancement and the URL of the API request that the CLI would generate. Could you let me know if this is the expected behavior?

Also, a few questions:

1) Is there any reason to ever use locationCode instead of locationCodes, even if only one location is asked for?

2) Are these parameter names ok: peo_in == "Principal Executives Offices in" and inc_in == "Incorporated In" ?

3) You mentioned "The TEXT_SEARCH_LOCATIONS_MAPPING object should be used to support this", by this do you mean that, for example, the following command should work like this:

//Incorporated in Mexico

$ edgar-tool text_search Tsunami Hazards --start_date "2019-06-01" --end_date "2024-01-01" --output "results.csv" --inc_in "Mexico"

GET https://www.sec.gov/edgar/search/#/q=Tsunami%2520Hazards&dateRange=custom&locationCode=MX&locationType=incorporated&startdt=2019-06-01&enddt=2024-01-01

Examples of new functionality

//Principal Executives Offices in multiple locations

$ edgar-tool text_search Tsunami Hazards --start_date "2019-06-01" --end_date "2024-01-01" --output "results.csv" --peo_in "NY, OH"

GET https://www.sec.gov/edgar/search/#/q=Tsunami%2520Hazards&dateRange=custom&locationCode=NY,OH&startdt=2019-06-01&enddt=2024-01-01

//Principal Executives Offices in single location

$ edgar-tool text_search Tsunami Hazards --start_date "2019-06-01" --end_date "2024-01-01" --output "results.csv" --peo_in "NY"

GET https://www.sec.gov/edgar/search/#/q=Tsunami%2520Hazards&dateRange=custom&locationCode=NY&startdt=2019-06-01&enddt=2024-01-01

//Incorporated in multiple locations

$ edgar-tool text_search Tsunami Hazards --start_date "2019-06-01" --end_date "2024-01-01" --output "results.csv" --inc_in "NY, OH"

GET https://www.sec.gov/edgar/search/#/q=Tsunami%2520Hazards&dateRange=custom&locationCode=NY,OH&locationType=incorporated&startdt=2019-06-01&enddt=2024-01-01

Not supported Behaviour

// User cannot use both peo_in and inc_in because the SEC API doesn't allow this.

$ edgar-tool text_search Tsunami Hazards --start_date "2019-06-01" --end_date "2024-01-01" --output "results.csv" --inc_in "NY, OH" --peo_in "NY,OH"

returns: EXCEPTION "use only one of peo_in or inc_in, not both"

GalenReich commented 6 months ago

Hi Jack, thank you for picking this up and apologies for not seeing this over the weekend, I'll review your PR in a moment 🚀 Thank you for writing such a detailed reflection on the issue, all of your comments look right to me.