OpenBB-finance / OpenBB

Investment Research for Everyone, Everywhere.
https://openbb.co
Other
32.49k stars 2.97k forks source link

[FR] - add provider for yfinance-cache #6159

Open ValueRaider opened 7 months ago

ValueRaider commented 7 months ago

I hope the benefit of persistently caching data is obvious - speed for user, less load on provider.

I've been working on a persistent caching wrapper for yfinance - yfinance-cache. Basic idea is be smart about what & when to fetch. Supports a subset of yfinance - price history, calendar, shares outstanding, info. Currently finishing off financials caching.

Still needs some polish, but now might be a good time to start thinking about integrating into OpenBB. API intended as a drop-in replacement of yfinance so should be easy. @deeleeramone I see you mostly handle the yfinance provider, this feature might interest you. I did experiment creating a provider, but then discovered providers hardcoded anyway.

deeleeramone commented 7 months ago

@ValueRaider, thanks for reaching out! We basically run a wrapper around the download function and Ticker class that connects it to our common parameters and parsing requirements. There's always things to add yet, but a caching solution sounds fantastic - especially helpful for things like reference data and symbol mapping.

It looks like a parameter for "use_cache" could be added where it is available, and then redirect the Python function accordingly.

A couple of questions for you:

ValueRaider commented 6 months ago

Does this also redirect the Cookie and Crumb caching that is performed by the yFinance library?

No.

Is there a function for setting the cache location?

Unofficially yes, implemented for unit tests. I'd have to add safety checks in case new location has old/incompatible data.

What is the format the cache is stored in?

Almost-entirely Pickled Python objects, each with a small metadata dict. Prices are Pickled Pandas Dataframes.

ls ~/.cache/py-yfinance-cache/AAPL 
annuals.pkl   dividends.pkl       events.log     full-release-dates.pkl  history-1h.pkl   info.pkl                   listing_date.json
calendar.pkl  earnings_dates.pkl  fast_info.pkl  history-1d.pkl          history-1wk.pkl  interim-release-dates.pkl  quarterlys.pkl

Is the cache itself able to be used asynchronously?

Not a clue, I work synchronously. If this is fast then do you really need async?

Btw since creating this FR, new release adds financials caching.