dpguthrie / yahooquery

Python wrapper for an unofficial Yahoo Finance API
https://yahooquery.dpguthrie.com
MIT License
789 stars 139 forks source link

BUG: Ticker.formatted=False formats some dates as strings #90

Open othalan opened 3 years ago

othalan commented 3 years ago

Describe the bug

Documentation indicates that using Ticker(format=False) or Ticker.formatted = False provides raw data. However, some timestamps are formatted into strings instead of being provided as raw data.

To Reproduce Steps to reproduce the behavior:

import yahooquery
stock = 'ZM'
yqt = yahooquery.Ticker([stock], format=False)
price_data = yqt.price
for key in ['regularMarketTime', 'postMarketTime', 'preMarketTime']:
    if key in price_data[stock]:
        print(f"{key: <17s} is type {type(price_data[stock][key])} with value: {price_data[stock][key]}")

Output of the above code:

regularMarketTime is type <class 'str'> value: 2021-07-26 14:00:02
postMarketTime    is type <class 'int'> value: 1627331433
preMarketTime     is type <class 'str'> value: 2021-07-26 07:29:58

(Note that sometimes one of the above timestmaps will be missing, which appears to be a normal effect of the yahoo api and not related to this library.)

Expected behavior

I would expect that with formatted=False, all timestamps are int data types (raw data) not a mixture of int (raw) and str (formatted) data.

A string timestamp is useful for display (formatted data), but is useless for python code examining date-time values. An integer timestamp is expected in this case, as it can be utilized directly or easily converted into a python datetime or pandas Timestamp object. The fact that some dates remain integers while other dates are formatted as strings makes this bug particularly irritating to work with.

The formatted strings are particularly irritating because they have been converted into the local timezone, not a timezone representative of the market in question or UTC.

Environment

Addional Context

URL generated for the above example of this bug:

https://query2.finance.yahoo.com/v10/finance/quoteSummary/ZM?modules=price&formatted=false&lang=en-US&region=US&corsDomain=finance.yahoo.com

Note that all time values in the above URL are integer values.

othalan commented 3 years ago

For anyone who also needs a workaround for this bug, here is the fix I have used in my code using pandas, tzlocal and pytz

from __future__ import annotations
import pandas as pd
import pytz
import tzlocal

def convert_datetime(value: str|int) -> pd.Timestamp:
    """Convert String/Integer time stamps to pandas timestamp objects.
    """
    # Obtain the local timezone
    local = tzlocal.get_localzone()

    if type(value) == str:
        # Convert string timestamps to time object
        # ... and change the timezone into US/Eastern so it represents the market's time and not local time
        dt = pd.Timestamp(value, tz=local).tz_convert(pytz.timezone("US/Eastern"))

    elif type(value) == int:
        # Convert integer timestamps to a datetime object
        # ... and change the timezone into US/Eastern so it represents the market's time and not local time
        dt = pd.Timestamp.fromtimestamp(value).tz_localize(local).tz_convert(pytz.timezone("US/Eastern"))

    # Just in case another possibility appears, return the provided value
    else:
        dt = value

    # return the correct local timezone
    return dt

Not ideal that the target timezone is hard coded into the function, however my project only deals with US markets...

dpguthrie commented 2 years ago

Only solved part of this issue with #117 as I need to think more about how to deal with timestamps here (probably shouldn't be using local)