dpguthrie / yahooquery

Python wrapper for an unofficial Yahoo Finance API
https://yahooquery.dpguthrie.com
MIT License
790 stars 139 forks source link

is the ohlc data corrected? #30

Closed slapslash closed 2 years ago

slapslash commented 4 years ago

Just found this package as an possible alternative to yfinance and was wondering, if the ohlc data is corrected, like yf do it (partially).
What I mean is, that ohlc is adjusted to the adjusted close or ticks are corrected where high is below close/open and all the other fun stuff, than one can find in the data, yahoo is providing.

dpguthrie commented 4 years ago

I don't have this in the history method yet but will add it in the next version release. So, stay tuned.

dpguthrie commented 4 years ago

Take a look at the newest version, 2.2.6. There's an additional argument in the history method, adj_ohlc. Set that to True to adjust the OHLC data.

slapslash commented 4 years ago

Yes, that's the primary correction to "adjusted close", which I found extremely important when working with technical indicators. What makes sense, as there are big price jumps, when not correcting to aclose. But what about the other problems (partially mentioned above), that occure in yahoo historic data?

dpguthrie commented 4 years ago

What are some of the other problems? Could you provide any examples?

slapslash commented 4 years ago

I'd love to!

dpguthrie commented 4 years ago

Thanks for expanding on your first post. I understand what you’re saying, but I haven’t seen any data come back like that yet. Do you know of any tickers that show data coming back like that?

slapslash commented 4 years ago

sure,

historic data (max time period, daily frequency) downloaded as csv directly from yahoo.

for data having nan:

import pandas as pd

d = pd.read_csv('RAW.DE.csv')
print(d[d.isna().any(axis = 1)])

and for having invalid high/open:

d = pd.read_csv('HEN3.DE.csv')
print(d[d.eval('High < Open or High < Close or Low > Open or Low > Close')])
dpguthrie commented 4 years ago

Thanks again for providing the examples. That definitely seems to be a problem. I'm not sure how I'd go about fixing that; do you have any recommendations?

slapslash commented 4 years ago

Well, this is another part of the story, as it depends on what the user of the api is going to do with the data. Personally I drop rows having nan or 0.0 prices and correct high to the highest price of the row and low to the lowest. Guess the best thing to do is providing those corrections as optional parameters to let the user decide.

impredicative commented 3 years ago

The high and low should always be corrected by the package. There is no use case in which the high shouldn't be high and the low shouldn't be low. This should not be a user option. Right now I'm having to do these corrections manually after retrieving the data.

This has little to nothing to do with optional adjustments for dividends and splits.

dpguthrie commented 2 years ago

Happy to accept a PR to fix this but not something I'm going to fix.

maread99 commented 2 years ago

For info, market_prices gets prices via yahooquery and does correct ohlc.