dpguthrie / yahooquery

Python wrapper for an unofficial Yahoo Finance API
https://yahooquery.dpguthrie.com
MIT License
768 stars 137 forks source link

Inconsistent output from yahooquery get_financial_data for single symbol and multiple symbols #112

Open andrwat opened 2 years ago

andrwat commented 2 years ago

Describe the bug When the input has multiple symbols, if some financial data is not available for one symbol, a NaN is in the output dataframe cell. When the input is the single symbol with missing financial data, the output dataframe is very different from the multiple symbols one making it difficult for generic data handling.

To Reproduce

dataframe cell will have NaN for missing data in column

securities=['0700.HK','80737.HK'] ticker = Ticker(securities) df=ticker.get_financial_data(['CurrentAssets','Inventory'],trailing=False) df symbol  asOfDate periodType currencyCode CurrentAssets Inventory
0700.HK 2018-12-31 12M CNY 2.170800e+11 3.240000e+08
0700.HK 2019-12-31 12M CNY 2.539680e+11 7.180000e+08
0700.HK 2020-12-31 12M CNY 3.176470e+11 8.140000e+08
0700.HK 2021-12-31 12M CNY 4.848120e+11 1.063000e+09
80737.HK 2019-12-31 12M CNY 5.067300e+07 NaN
80737.HK 2020-12-31 12M CNY 1.589339e+09 NaN
80737.HK 2021-12-31 12M CNY 2.173692e+09 NaN

Dataframe does NOT have the ROWS for symbol with missing data

securities=['0700.HK','80737.HK'] ticker = Ticker(securities) df=ticker.get_financial_data(['Inventory'],trailing=False) df symbol asOfDate periodType currencyCode Inventory
0700.HK 2018-12-31 12M CNY 3.240000e+08
0700.HK 2019-12-31 12M CNY 7.180000e+08
0700.HK 2020-12-31 12M CNY 8.140000e+08
0700.HK 2021-12-31 12M CNY 1.063000e+09

Dataframe does NOT have the COLUMNS of the missing data

securities=['80737.HK'] ticker = Ticker(securities) df=ticker.get_financial_data(['CurrentAssets','Inventory'],trailing=False) df symbol asOfDate periodType currencyCode CurrentAssets
80737.HK 2019-12-31 12M CNY 5.067300e+07
80737.HK 2020-12-31 12M CNY 1.589339e+09
80737.HK 2021-12-31 12M CNY 2.173692e+09

a STRING is returned instead of a dataframe

securities=['80737.HK'] ticker = Ticker(securities) df=ticker.get_financial_data(['Inventory'],trailing=False) print(df) df['Inventory']

Cash Flow data unavailable for 80737.HK

TypeError Traceback (most recent call last) Input In [129], in <cell line: 5>() 3 df=ticker.get_financial_data(['Inventory'],trailing=False) 4 print(df) ----> 5 df['Inventory']

TypeError: string indices must be integers

Jupyter notebook with above examples

Untitled1.ipynb.gz

dpguthrie commented 1 year ago

What is the desired output in these scenarios:

andrwat commented 1 year ago

Dear Doug,

Thanks for your reply.

I casually write python programs with pandas dataframe.

It would be nice to put NaN for all missing cells and the whole missing columns.

I think it is good to always return a dataframe even if it is empty and return a dataframe with 1 column instead of a series, then people don't need to check if it is a dataframe, a series or a string and then process them differently.

In this way, I think I can simply do fillna(0) for all NaN cells and make the dataframe processing statements a lot cleaner.

Hope this helps and I share similar views as the others.

Thanks Andrew

On Sun, Oct 16, 2022 at 1:07 AM Doug Guthrie @.***> wrote:

What is the desired output in these scenarios:

  • NaN for missing data in column
  • Missing column
  • String returned <-- This one seems like one that needs to be addressed the most. What do you think would be best here? Raising an error, returning an empty dataframe, returning an empty dataframe with the columns you specified?

— Reply to this email directly, view it on GitHub https://github.com/dpguthrie/yahooquery/issues/112#issuecomment-1279784626, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBV3GCR3AGQLVKZHRYE36TWDLQFDANCNFSM57DHDUUQ . You are receiving this because you authored the thread.Message ID: @.***>