dpguthrie / yahooquery

Python wrapper for an unofficial Yahoo Finance API
https://yahooquery.dpguthrie.com
MIT License
789 stars 139 forks source link

Data sources aside from Yahoo #258

Open cmjordan42 opened 10 months ago

cmjordan42 commented 10 months ago

@dpguthrie and I briefly discussed this prospect with @ValueRaider a while back and I find myself considering it again. The quality of Y!Finance data has really gone off of a cliff in the past 6 months. Quarterly earnings data is now very often missing and even tickers with billions in market cap are missing swaths of data. It seems that Yahoo's dedication to their finance product is waning.

Have you guys or anyone else considered expanding yahooquery (at the time I think we were talking about yfinance) to other sources such as Marketwatch? Their data seems to be far more consistent, which makes sense since it's owned by Dow Jones who actually does this as a business.

I suppose the name would have to change from yahooquery ;)

RudyNL commented 10 months ago

One of the advantages of Yahoo Finance is that it is covering a major part of the European stock exchanges. For me I am also interested in Helsinki, Oslo, Stockholm, Copenhagen, Brussels, Amsterdam, London, Paris, Milan, Madrid, Zürich, Vienna, Frankfurt and DAX. An all American solution isn't a solution for me. The main advantage of Yahoo Finance is the world-wide coverage of the financial markets.

cmjordan42 commented 10 months ago

Good to know. I'm a bit confused though - are you saying that MarketWatch doesn't cover those domains? I'm not very active outside of the US exchanges so maybe I'm misunderstanding, but MW seems to cover a lot of the world.

RudyNL commented 10 months ago

At my first trial I landed here [Marketwatch][https://store.marketwatch.com] and didn't find a way out without payment. Today I found an alternative link [Marketwatch][https://www.marketwatch.com/] which is giving sufficient access. I checked a number of stocks at different exchanges and MarketWatch is looking fine.

ms82494 commented 10 months ago

I have a Yahoo Finance Plus subscription, but don't currently plan on renewing it. My number one dissatisfaction is the fragility of the community-supplied APIs, thanks to the seemingly random changes introduced by Yahoo's product management.

I have tried FinancialModelingPrep.com. They have a lot of data, at a very compelling price, but there's no quality assurance whatsoever. The poor data quality made it unusable.

I am currently testing tikr.com, which doesn't have a proprietary API but a couple of Github projects that offer community-supplied APIs. So far, so good. It's paid but they do offer a nice amount of data for the price: financials, estimates, earnings surprises, company guidance (if available), segment information, and earnings call transcripts.

There's also QuickFS (quickfs.net), which seems really promising as it offers an API with both a free and a paid service level and reasonable rate limits. But the founder, who used to promote the service in various subreddits, seems to have gone MIA. I've also reached out to their support and never received any answers. Since I don't want to deal with abandonware projects I decided against subscribing, but if anyone else has had a good experience with QuickFS I'd be open to give that one a try.

All three of these services offer data from various international exchanges.

cmjordan42 commented 10 months ago

I have very minimal issues with yahooquery. I previously used yfinance which was a lot more susceptible to Yahoo's instability, but the standard distributed systems defensive measures I wrapped around yahooquery make it pretty rock solid. For example, I have not had to tweak it for over 6 months.

The issue that has me at my wit's end is agnostic to the API. If I pull data for a symbol from the API it shows me what's on Yahoo's website UI... which is just missing loads of data, particularly on quarterly/annual earnings/financial reports. I'm curious @ms82494 - with your YF Plus subscription, do you NOT see missing data for recent periods of loads of common symbols? Perhaps they're deliberately withholding data from the non-Plus folks.

https://finance.yahoo.com/quote/AAPL/financials?p=AAPL image

ms82494 commented 10 months ago

@cmjordan42: I'd really be curious what "defensive measures" have insulated you from the mayhem during last October/November. Even now there are a lot of users in Europe complaining about inability to access yahooquery. I was certainly affected at times, both with yahooquery and yfinance. I don't mean this as a criticism of @dpguthrie or @ValueRaider, to whom I am immensely grateful. I just think that the adversarial attitude that Yahoo management takes to programmatic users (whether they are paid or free users) is really impacting me.

That said, Yahoo Finance data quality is second to none, imo. The financials are sourced from Morningstar, and those guys do a good job. The financials arrive timely, are highly detailed, and they provide separate charts of accounts for banks and insurance companies. And they go back to the dawn of time (in Apple's case 1985). I have seen more issues with (expensive!) financial statements from Mergent (part of FTSE/Russell group) than Morningstar financials provided by Yahoo Finance. The tikr.com financials, while also good, don't have the level of detail that Yahoo provides.

To answer your question about missing data with Yahoo financials with the YF+ service: I haven't noticed any issues, unless you go VERY far back in time. So, in Apple's case, the Net Income line is missing for quarterly reports prior to 1989-12-31. But that's probably due to different line item labels being used in early reports that predate the creation of SEC Edgar. There's no missing data for more recent reports. See below:

Type 'copyright', 'credits' or 'license' for more information
IPython 8.18.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import yahooquery as yq

In [2]: import os, operator

In [3]: yq.__version__
Out[3]: '2.3.7'

In [4]: YUSER, YPASS = operator.itemgetter('YUSER','YPASS')(os.environ)

In [5]: yqclient = yq.Ticker('AAPL', username=YUSER, password=YPASS)

In [6]: yqclient.p_get_financial_data(types=['TotalRevenue', 'NetIncome'], frequency='q', t
   ...: railing=False)
Out[6]: 
         asOfDate periodType currencyCode     NetIncome  TotalRevenue
symbol                                                               
AAPL   1985-09-30         3M          USD           NaN  4.097000e+08
AAPL   1985-12-31         3M          USD           NaN  5.339000e+08
AAPL   1986-03-31         3M          USD           NaN  4.089000e+08
AAPL   1986-06-30         3M          USD           NaN  4.483000e+08
AAPL   1986-09-30         3M          USD           NaN  5.108000e+08
...           ...        ...          ...           ...           ...
AAPL   2022-09-30         3M          USD  2.072100e+10  9.014600e+10
AAPL   2022-12-31         3M          USD  2.999800e+10  1.171540e+11
AAPL   2023-03-31         3M          USD  2.416000e+10  9.483600e+10
AAPL   2023-06-30         3M          USD  1.988100e+10  8.179700e+10
AAPL   2023-09-30         3M          USD  2.295600e+10  8.949800e+10

[153 rows x 5 columns]
cmjordan42 commented 10 months ago

Thanks. You made me dig into other API calls to see if they match. It turns out that Yahoo just has bugs in how it forms and interprets the JSON which their web UI pulls from - in some cases, that's my data source. For example,

yq.Ticker('AAPL').get_financial_data(frequency='q', types=['NetIncome','TotalRevenue'], trailing=False)
         asOfDate periodType currencyCode     NetIncome  TotalRevenue
symbol
AAPL   2022-12-31         3M          USD  2.999800e+10  1.171540e+11
AAPL   2023-03-31         3M          USD  2.416000e+10  9.483600e+10
AAPL   2023-06-30         3M          USD  1.988100e+10  8.179700e+10    <-
AAPL   2023-09-30         3M          USD  2.295600e+10  8.949800e+10

yq.Ticker('AAPL').earnings

{'AAPL': ... 'quarterly': [
{'date': '4Q2022', 'revenue': 117154000000, 'earnings': 29998000000}, 
{'date': '2Q2023', 'revenue': 94836000000, 'earnings': 24160000000}, 
{'date': '3Q2023', 'revenue': 81797000000, 'earnings': 19881000000},   <- 3Q2023???????
{'date': '3Q2023', 'revenue': 89498000000, 'earnings': 22956000000}]},  
'financialCurrency': 'USD'}}

In forming the JSON, they're listing 3Q2023 twice and then their UI is ignoring the second (correct) one and displaying the first (incorrect) one which is actually the prior quarter, and it cascades making all of the data incorrect. When I reported the data quality issues to Yahoo they ignored me. What a joke that this bug exists and has presumably existed for months since I've seen these missing quarterlies emerging for quite awhile. I'll just switch off of all of their JSON format APIs in favor of the tabular DataFrame formats which seem to be correct.

I'll again note that not only does MarketWatch have all of this data, but if you want to look at MarketWatch's UI it also is correct. YF seems to have a competent backend team and an incompetent frontend team; MW seems to be competent across the board.

cmjordan42 commented 10 months ago

@ms82494 Regarding defensive measures... first, I wish there was GH messaging, since we're hijacking this thread talking about API stability.

There are several things that I do to improve stability and shield my side from the Yahoo side:

ValueRaider commented 10 months ago

considered expanding yahooquery to other sources such as Marketwatch?

IMO these YF wrappers should stay focused on YF, just good software design. Modularity. Better to spin up another package for another source, or a "meta" package to combine multiple fetchers - something like OpenBB but without the GUI bloat would be neat.

ms82494 commented 10 months ago

@cmjordan42 : Firstly, thank you for the detail on how you ensure robustness for your data gathering from YF. That's definitely much more elaborate than what I do and I appreciate the ideas.

Secondly, on the MW data: For financials, I only see them deliver the most recent five quarterly and annual statements. IMO that's not enough to really figure out seasonality (need more Qs), or cyclicality (need more years). Maybe they have a paid plan that offers more data, but otherwise it wouldn't replace YF+ for me. Detail seems good, though. And kudos for not trying to shoehorn financials into a C&I chart of accounts.