Finnhub-Stock-API / finnhub-python

Finnhub Python API Client. Finnhub API provides institutional-grade financial data to investors, fintech startups and investment firms. We support real-time stock price, global fundamentals, global ETFs holdings and alternative data. https://finnhub.io/docs/api
https://finnhub.io/
Apache License 2.0
584 stars 101 forks source link

Incremental aggregation of intraday data. #49

Closed preritdas closed 1 year ago

preritdas commented 2 years ago

Unfortunately, the official Finnhub API can only return one month of intraday data at a time, regardless of the user's plan. There is no way around this. Finnhub support suggests gathering intraday data from each month in separate requests, then combining them.

The updates I propose are non-breaking and will have no impact on existing user code depending on this package. They simply provide two new methods in the Client class: Client.stock_candles_intraday and its internal dependency, Client.stock_candles_df.

Intraday Stock Candles

When the _from and to function parameters are wider than one month, the intraday stock candles method will increment through the window and gather data in separate requests (separated with a 0.4-second delay to stay within the rate limit, which can easily be changed or implemented differently to account for different plans, ex. if a user has a higher rate limit, the delay would be smaller, speeding up the process; I expand on this at the end).

The data is complete; there is absolutely no missing data between incremented windows. The effective behavior of this function is as if the 1-month intraday API limitation didn't exist, at the expense of a slower response time (sleeping between window aggregations to stay within the API rate limit; I expand on this at the end).

The data is returned in pd.DataFrame format, processed in the following ways.

  1. A Date column is created in datetime format and set as the index. This allows for windowed lookups, etc.
  2. Single character keys from the original JSON response (c, l, o, h, etc.) are turned into proper column names ("Open", "High", "Low", "Close", "Volume"), recognizable by most financial data libraries including TA-Lib and Pandas TA.
  3. A new optional filter_eod parameter will filter the DataFrame for data that came from within market hours. This is made possible by the fact that we parsed and indexed the new Date column in datetime format.

https://github.com/preritdas/finnhub-python/blob/b3b72157b07fa4de4593e006623b599b85b362c6/finnhub/client.py#L229-L257

Thoughts and Ideas

https://github.com/preritdas/finnhub-python/blob/b3b72157b07fa4de4593e006623b599b85b362c6/finnhub/client.py#L287

preritdas commented 2 years ago

Rate Limit Handled

I created a decorator, handle_rate_limit which wraps a function in an exception handler. No changes need to be made to any implementations - this is completely internal and only impacts the two new functions I created (mentioned in my first comment).

https://github.com/preritdas/finnhub-python/blob/3cc74716db9b37d55497c2c7fee81fe659d2b8ec/finnhub/client.py#L11-L25

I then wrap my stock_candles_df function, which is the backend for incremental aggregation, with this decorator.

https://github.com/preritdas/finnhub-python/blob/3cc74716db9b37d55497c2c7fee81fe659d2b8ec/finnhub/client.py#L246-L250

The effect is that the function is called as normal, but if the user's rate limit is exceeded, we sleep for a second and try the endpoint again, recursively, continually, until the limit is resolved.

Result

This rate limit handling behavior will only occur when aggregating incremental data with the new Client.stock_candles_intraday method. As a result of this implementation, I was able to completely remove the time.sleep(0.4) line I expressed concern over in my original comment. The incremental aggregation function runs several times faster than before, and only slows in the rare case that a user exceeds their rate limit by having too many data windows (very unlikely, see below).

The reason this is unlikely is as follows.

The basic plan rate limit is 150/min.

$$ 365.25 \ \text{days} * 10 \ \text{years} = 3652.5 \ \text{days} $$

$$ 3652.5 \ \text{days} / 29 \ \text{days/window} = 126 \ \text{windows} $$

So even the basic plan can increment through 10 years of intraday data without having rate limit issues (their maximum lookback period anyways). But, this way, we have a consistently performant failsafe.