Closed VKorelsky closed 1 year ago
@VKorelsky Thank you for submitting this MR and bringing this up. I'll take a look at this and provide feedback by middle of the week.
@VKorelsky I reverted this PR due to errors showing up in the tests, going to try to tweak this and remerge:
Error
Traceback (most recent call last):
File "/Users/cs/yahoofinancials/test/test_yahoofinancials.py", line 58, in test_yf_fundamentals
multi_balance_sheet_data_qt = self.test_yf_stock_multi.get_financial_stmts('quarterly', 'balance')
File "/Users/cs/yahoofinancials/yahoofinancials/yf.py", line 88, in get_financial_stmts
data = self._run_financial_stmt(statement_type, report_num, frequency, reformat)
File "/Users/cs/yahoofinancials/yahoofinancials/yf.py", line 79, in _run_financial_stmt
data = self.get_reformatted_stmt_data(raw_data, statement_type)
File "/Users/cs/yahoofinancials/yahoofinancials/etl.py", line 615, in get_reformatted_stmt_data
sub_dict_ent = self._get_sub_dict_ent(tick, raw_data)
File "/Users/cs/yahoofinancials/yahoofinancials/etl.py", line 548, in _get_sub_dict_ent
form_data_list = self._reformat_stmt_data_process(raw_data[ticker])
KeyError: 'IL&FSTRANS.NS'
While using the library to process a batch of tickers I noticed that fetching data synchronously caught 404 exceptions and kept processing tickers, while running in concurrent mode didn't.
Digging into the code a bit I also noticed that retries were set to happen for any status code that isn't 200. Since 404 responses always lead to the same outcome the library ends up trying the requests 10 times, sleeping 20 to 40 seconds in between each try, slowing everything right down.
So ended up
YahooFinanceETL#get_stock_data
to align behavior between concurrent and non concurrent modes - concurrent mode now logs a warning and keeps going instead of breaking.YahooFinanceETL#_request_handler
to avoid retrying for any4xx
status code. This led to a big speed bump for processing my data set. (which resulted in a fair amount of 404s)