farhadab / sec-edgar-financials

Extract financial data from the SEC's EDGAR database
MIT License
138 stars 41 forks source link

RequestException: 403 #12

Closed IonutQo2 closed 3 weeks ago

IonutQo2 commented 3 weeks ago

Hello,

I tried the standard script provided in the example and I received the following error:

python .\sec_edgar_scrapper.py
cik for AAPL is 320193
getting ['10-Q', '10-Q/A'] filing info from https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx
Traceback (most recent call last):
  File ".\sec_edgar_scrapper.py", line 9, in <module>
    filing = stock.get_filing(period, year, quarter)
  File "D:\Programming\Stonks\sec-edgar-financials-master\edgar\stock.py", line 34, in get_filing
    filing_info_list = get_financial_filing_info(period=period, cik=self.cik, year=year, quarter=quarter)
  File "D:\Programming\Stonks\sec-edgar-financials-master\edgar\edgar.py", line 277, in get_financial_filing_info
    return get_filing_info(cik=cik, forms=forms, year=year, quarter=quarter)
  File "D:\Programming\Stonks\sec-edgar-financials-master\edgar\edgar.py", line 165, in get_filing_info
    return _get_filing_info(cik=cik, forms=forms, year=year_str, quarter=quarter_str)
  File "D:\Programming\Stonks\sec-edgar-financials-master\edgar\edgar.py", line 215, in _get_filing_info
    response = GetRequest(url).response
  File "D:\Programming\Stonks\sec-edgar-financials-master\edgar\requests_wrapper.py", line 8, in __init__
    raise RequestException('{}: {}'.format(response.status_code, response.text))
edgar.requests_wrapper.RequestException: 403: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>SEC.gov | Request Rate Threshold Exceeded</title>

Your Request Originates from an Undeclared Automated Tool</h1>
<p>To allow for equitable access to all users, SEC reserves the right to limit requests originating from undeclared automated tools. Your request has been identified as part of a network of automated tools outside of the acceptable policy and will be managed until action is taken to declare your traffic.</p>

<p>Please declare your traffic by updating your user agent to include company specific information.</p>

My code is the following:

from edgar.stock import Stock

stock = Stock('AAPL')

period = 'quarterly' # or 'annual', which is the default
year = 2016 # can use default of 0 to get the latest
quarter = 1 # 1, 2, 3, 4, or default value of 0 to get the latest
# using defaults to get the latest annual, can simplify to stock.get_filing()
filing = stock.get_filing(period, year, quarter)

# financial reports (contain data for multiple years)
# income_statements = filing.get_income_statements()
# balance_sheets = filing.get_balance_sheets()
cash_flows = filing.get_cash_flows()
print(cash_flows)

Anyone else encountered this? They say maximum 10 requests per second, is the provided script doing more?

Nwosu-Ihueze commented 3 weeks ago
from edgar.stock import Stock
import requests

original_get = requests.get

def get_with_user_agent(*args, **kwargs):
    if 'headers' not in kwargs:
        kwargs['headers'] = {}
    kwargs['headers']['User-Agent'] = 'Your-name app-name your-email'
    return original_get(*args, **kwargs)

requests.get = get_with_user_agent

stock = Stock('AAPL')

period = 'quarterly' # or 'annual', which is the default
year = 2016 # can use default of 0 to get the latest
quarter = 1 # 1, 2, 3, 4, or default value of 0 to get the latest
# using defaults to get the latest annual, can simplify to stock.get_filing()
filing = stock.get_filing(period, year, quarter)

# financial reports (contain data for multiple years)
# income_statements = filing.get_income_statements()
# balance_sheets = filing.get_balance_sheets()
cash_flows = filing.get_cash_flows()
print(cash_flows)

Hope this helps

IonutQo2 commented 3 weeks ago

Thanks a lot, it worked.