ValueRaider / yfinance-cache

Caching wrapper for yfinance module. Intelligent caching, not dumb caching of web requests.
MIT License
22 stars 9 forks source link

Potential bug in yfc: Typeerror #35

Closed kschmid closed 10 months ago

kschmid commented 11 months ago

Tried to use yfc as plugin-replacement for yf. However, on one call I got a typeerror, apparently coming from yfc_prices_manager:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

** Stuff deleted due to personal information **
** Last call in my code:  parthist = ticker.history(period="7d", interval="1m") **

File [/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py:232](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py:232), in Ticker.history(self, interval, max_age, period, start, end, prepost, actions, adjust_splits, adjust_divs, keepna, proxy, rounding, debug, quiet, trigger_at_market_close)
    [230](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py?line=229) hist = self._histories_manager.GetHistory(interval)
    [231](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py?line=230) if period is not None:
--> [232](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py?line=231)     h = hist.get(start=None, end=None, period=period, max_age=max_age, trigger_at_market_close=trigger_at_market_close, quiet=quiet)
    [233](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py?line=232) elif interday:
    [234](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py?line=233)     h = hist.get(start_d, end_d, period=None, max_age=max_age, trigger_at_market_close=trigger_at_market_close, quiet=quiet)

File [/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py:609](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py:609), in PriceHistory.get(self, start, end, period, max_age, trigger_at_market_close, repair, prepost, adjust_splits, adjust_divs, quiet)
    [605](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=604) f_nfinal = ~f_final
    [606](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=605) # - also treat repaired data as non-final, if fetched near to interval timepoint
    [607](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=606) #   because Yahoo might now have correct data
    [608](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=607) #   TODO: test!
--> [609](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=608) f_repair = (self.h["Repaired?"].to_numpy() & (self.h["FetchDate"] < (self.h.index + self.itd + td_7d)).to_numpy())
    [610](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=609) f_nfinal = f_nfinal | f_repair
    [611](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=610) for idx in np.where(f_nfinal)[0]:
    [612](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=611)     # repaired = False

TypeError: unsupported operand type(s) for &: 'float' and 'bool'

same code with same arguments runs for yfinance. But I can not exclude the possibility of incorrect handling, hence, could also be on my side, but I suppose incorrect arguments to an "&" should be addressed within a library.

ValueRaider commented 11 months ago

This is certainly a bug and I can reproduce.

ValueRaider commented 11 months ago

It seems I could only reproduce when using old yfinance version. What version is yours?

kschmid commented 11 months ago

Currently using yfinance 0.2.27 yfinance-cache 0.4.4

Did not update today, so this was also the version in my tests. Note that when trying it out, I simply replaced the yfinance import with yfinance-cache, i.e., import yfinance-cache as yf And the program only used the ticker and the history calls.

I can rerun this, if desired and rebuild a cache accordingly or post my experimental program or full error trace (but not public). Just let me know, what helps.

ValueRaider commented 11 months ago

Update your packages, I sent out big updates today. You might still see error because YF may have corrupted your cache, but quickest fix is to delete your YFC cache folder.

kschmid commented 10 months ago

Ok, just tried: Updated to yfinance 0.2.28 yfinance-cache 0.4.5

crashes again, although differently. See below.

I started with an empty directory and removed all previous downloads. It seems to crash in the second call to history (below). There is a previous one, which aims to download everything in a daily resolution.

again sanitized as it contains personal stuff last part of my call-stack is: parthist = ticker.history(period="60d", interval="2m")

File /usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py:236, in Ticker.history(self, interval, max_age, period, start, end, prepost, actions, adjust_splits, adjust_divs, keepna, proxy, rounding, debug, quiet, trigger_at_market_close) 234 hist = self._histories_manager.GetHistory(interval) 235 if period is not None: --> 236 h = hist.get(start=None, end=None, period=period, max_age=max_age, trigger_at_market_close=trigger_at_market_close, quiet=quiet) 237 elif interday: 238 h = hist.get(start_d, end_d, period=None, max_age=max_age, trigger_at_market_close=trigger_at_market_close, quiet=quiet)

File /usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py:827, in PriceHistory.get(self, start, end, period, max_age, trigger_at_market_close, repair, prepost, adjust_splits, adjust_divs, quiet) 825 if isinstance(y, (datetime, pd.Timestamp)): 826 if y > dt_now: --> 827 ranges_to_fetch[i][1] = min(dt_now.ceil(), y) 828 elif y > d_now_exchange: 829 sched = yfct.GetExchangeSchedule(self.exchange, d_now_exchange, y + td_1d)

File /usr/local/lib/python3.11/site-packages/pandas/_libs/tslibs/timestamps.pyx:1890, in pandas._libs.tslibs.timestamps.Timestamp.ceil()

TypeError: ceil() takes at least 2 positional arguments (1 given)

ValueRaider commented 10 months ago

I've pushed a fix to GitHub branch bug-fixes - can you install and test? Instructions: https://github.com/ranaroussi/yfinance/discussions/1080

Reason for not publishing = could be more bugs. Reason for potentially more bugs = intraday price caching not thoroughly tested. Thorough testing is hard with just unit tests - dynamic interplay of cache state & time-of-day. Case-in-point - I can't reproduce your error, but Pandas docs confirm it's genuine.

Thorough testing needs real-world use, and I only use 1d & 1wk data.

kschmid commented 10 months ago

I tried it out. Seems to install correctly, but errors do not change:

File [/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py:236](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py:236), in Ticker.history(self, interval, max_age, period, start, end, prepost, actions, adjust_splits, adjust_divs, keepna, proxy, rounding, debug, quiet, trigger_at_market_close)
    [234](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py?line=233) hist = self._histories_manager.GetHistory(interval)
    [235](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py?line=234) if period is not None:
--> [236](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py?line=235)     h = hist.get(start=None, end=None, period=period, max_age=max_age, trigger_at_market_close=trigger_at_market_close, quiet=quiet)
    [237](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py?line=236) elif interday:
    [238](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py?line=237)     h = hist.get(start_d, end_d, period=None, max_age=max_age, trigger_at_market_close=trigger_at_market_close, quiet=quiet)

File [/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py:628](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py:628), in PriceHistory.get(self, start, end, period, max_age, trigger_at_market_close, repair, prepost, adjust_splits, adjust_divs, quiet)
    [626](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=625) f_repair = self.h["Repaired?"].to_numpy()
    [627](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=626) f_na = self.h['Close'].isna().to_numpy()
--> [628](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=627) f_repair = f_repair | f_na
    [629](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=628) cutoff_dts = self.h.index + self.itd + timedelta(days=7)
    [630](file:///usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py?line=629) # Ignore repaired data if fetched/repaired 7+ days after interval end

TypeError: unsupported operand type(s) for |: 'float' and 'bool'

I also enclose the downloader program I use (only modification is the path variable should be set to a different directory, then it should work also for others. And potentially: modify the tickers list at the end.

It basically downloads the data and refines it with higher resolution data. (2m, 1m) to the degree the data is available and upon rerun it tries to minimize downloads and adds high-resolution data.

#%% 
# create a downloader of finance data from yahoo finance
# this should include caching and rate limiting.
# all data should be downloaded at the maximum resolution available, taking into account data that has already been downloaded.
# the data should be stored in a file per stock, with the name of the stock as the filename.

# perhaps this can sped up according to https://github.com/ranaroussi/yfinance/issues/1647
# however, we need to sure that we do not get banned from yahoo finance
# another option would be to use multi download:
# according to https://aroussi.com/post/python-yahoo-finance
# but this would lead to a different interface

from tracemalloc import start
from turtle import down
# import yfinance as yf
import yfinance_cache as yf
import pandas as pd
import numpy as np
import datetime
import time
import os

from requests import Session
from requests_cache import CacheMixin, SQLiteCache
from requests_ratelimiter import LimiterMixin, MemoryQueueBucket
from pyrate_limiter import Duration, RequestRate, Limiter

# Initializations:
# yf.pdr_override() # pandas_datareader override

path=""
suffix=".csv" # we will save the data as csv files

# lets store the cache in the path directory as well
cache_file = "yfinance.cache"
full_cache_path = os.path.abspath(os.path.join(path, cache_file))

def set_path(new_path):
    global path
    path=new_path

def set_suffix(new_suffix):
    global suffix
    suffix=new_suffix

def get_full_path(ticker_name):
    return os.path.abspath(path+ticker_name+suffix)

# based on https://github.com/ranaroussi/yfinance#smarter-scraping
class CachedLimiterSession(CacheMixin, LimiterMixin, Session): pass
session = CachedLimiterSession(
    limiter=Limiter(RequestRate(2, Duration.SECOND*5)),  # max 2 requests per 5 seconds
    bucket_class=MemoryQueueBucket,
    backend=SQLiteCache(full_cache_path), # alternative SQLiteCache(use_memory=True)
    cache_control=True, # Use Cache-Control response headers for expiration, if available
    expired_after=Duration.SECOND*60*60*24*7, # cache expires after 7 days
)

# non_cached_session = LimiterMixin(
#     limiter=Limiter(RequestRate(2, Duration.SECOND*5)),
#     bucket_class=MemoryQueueBucket
# )

# def get_history_no_cache(ticker, *args, **kwargs):
#     original_session = ticker._data._session
#     try:
#         # Replace the session with the non-cached version
#         ticker.session = non_cached_session
#         ticker._data._session = non_cached_session
#         # Call the method you want without caching
#         return ticker.history(*args, **kwargs)
#     finally:
#         # Revert back to the original session
#         ticker._data._session = original_session
#         ticker.session = original_session

# perform the downloading for a specific stock:
def get_ticker(stock="msft", start_date="2023-01-01", end_date="2023-01-01", force_download=False):
    session.headers['User-agent'] = 'my-program/1.0'
    ticker=yf.Ticker(stock, session=session)
    # The scraped response will be stored in the cache
    return ticker

# download or update data for a specific stock
def download_data(ticker_name):
    # we know the ticker name, so we can get the ticker and the filename
    # we assume they are all cached, so we can just get the ticker again
    # only the relevant data will be downloaded again. 
    # In case there is no data, we just redownload everything
    ticker = get_ticker(ticker_name)
    fullname=get_full_path(ticker_name)
    if os.path.exists(fullname):  # we need to download everything
        # we need to read the existing data from disk
        hist = pd.read_csv(fullname, index_col=0, parse_dates=True)
        # determine the last date
        last_date = hist.index[-1].tz_localize(None) # remove timezone from last date as datetime.now() does not have a timezone
        gap_days = (datetime.datetime.now() - last_date).days
    else: 
        gap_days=-1 # negative value to indicate that we need to download everything

    # at this point we have the ticker and the gap_days and the latter tells us how much data we need to download
    # >60 days: download with daily resolution; <=60 days: download with 2 minute resolution; <=7 days: download with 1 minute resolution

    print("ticker: " + ticker_name + " gap_days: " + str(gap_days))
    if (gap_days > 60) or (gap_days<0): # 
        # download everything
        # print("downloading everything for ticker " + ticker_name)
        hist = ticker.history(period="max")
        # the more recent part is not downloaded in higher resolution yet, so we do set gap_days=60
        gap_days = 60
    if gap_days > 7:
        # print("downloading 2m for ticker " + ticker_name)
        parthist = ticker.history(period="60d", interval="2m")
        hist = pd.concat([hist, parthist])
        gap_days = 7
    if gap_days > 0: # note that this will always be true if we make any new download (but not if the data on disk is from today)
        # get highest resolution data for the last 5 days
        # print("downloading 1m for ticker " + ticker_name)
        parthist = ticker.history(period="7d", interval="1m")
        # parthist = get_history_no_cache(ticker, period="7d", interval="1m")
        # combine the two
        hist = pd.concat([hist, parthist])
        # remove duplicates
        hist = hist[~hist.index.duplicated(keep='first')]
        # sort by date
        hist = hist.sort_index()
        # return the data
    else:
        print("no data to download for " + ticker_name)

    return hist

# save history data to a file
def save_data(ticker_name, data):
    data.to_csv(get_full_path(ticker_name))

# download all data for a list of tickers
def download_all_data(tickers):
    tst = time.time() 
    for ticker_name in tickers:
        # we time the download in msec
        start_time = time.time() 

        # download the data
        data = download_data(ticker_name)
        end_time = time.time()
        print("Download time for ticker " + ticker_name + " is " + str(round((end_time-start_time))) + " sec")
        # save the data
        save_data(ticker_name, data)
        end_time = time.time()
        print("Total time for ticker " + ticker_name + " is " + str(round((end_time-start_time))) + " sec")
    tet = time.time()
    print("Total time for all tickers is " + str(round((tet-tst))) + " sec")

# tickers=["msft", "aapl", "goog", "amzn", "meta", "tsla", "nvda", "pypl", "adbe", 
#        "crm", "intc", "nflx", "cmcsa", "cost", "pep", "amd", "avgo", "csco", "qcom",
#        "amgn", "txn", "chtr", "sbux", "intu", "isrg", "bkng", "atvi", "csx", "mu"]
tickers=["msft"]

download_all_data(tickers) 

not a beauty, but hacked together over the weekend.

ValueRaider commented 10 months ago

I can't reproduce that error, but it looks like the yfinance problem still. I've pushed a commit that will, next time error occurs, hopefully print out variables I need to reproduce. Unless cause is different.

kschmid commented 10 months ago

Hm, interesting. I would have assumed this to be independent of running machine. Tried:

pip3 install git+https://github.com/ValueRaider/yfinance-cache.git@bug-fixes

Worked out ok. But no difference in output. No additional outputs.

ValueRaider commented 10 months ago

Did you delete your local YFC cache before re-running?

kschmid commented 10 months ago

yes, every time. Perhaps I will find some time over the weekend to do some debugging.

ValueRaider commented 10 months ago

I've added more debugging code.

kschmid commented 10 months ago

parthist = ticker.history(period="60d", interval="2m")

File /usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py:236, in Ticker.history(self, interval, max_age, period, start, end, prepost, actions, adjust_splits, adjust_divs, keepna, proxy, rounding, debug, quiet, trigger_at_market_close) 234 hist = self._histories_manager.GetHistory(interval) 235 if period is not None: --> 236 h = hist.get(start=None, end=None, period=period, max_age=max_age, trigger_at_market_close=trigger_at_market_close, quiet=quiet) 237 elif interday: 238 h = hist.get(start_d, end_d, period=None, max_age=max_age, trigger_at_market_close=trigger_at_market_close, quiet=quiet)

File /usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py:833, in PriceHistory.get(self, start, end, period, max_age, trigger_at_market_close, repair, prepost, adjust_splits, adjust_divs, quiet) 831 if isinstance(y, (datetime, pd.Timestamp)): 832 if y > dt_now: --> 833 ranges_to_fetch[i][1] = min(dt_now.ceil('1D'), y) 834 elif y > d_now_exchange: 835 sched = yfct.GetExchangeSchedule(self.exchange, d_now_exchange, y + td_1d)

TypeError: 'tuple' object does not support item assignment

kschmid commented 10 months ago

Spent a bit more time: the problem is insofar now clear. It seems to install the right version:

pip3 install git+https://github.com/ValueRaider/yfinance-cache.git@bug-fixes
Collecting git+https://github.com/ValueRaider/yfinance-cache.git@bug-fixes
  Cloning https://github.com/ValueRaider/yfinance-cache.git (to revision bug-fixes) to /private/var/folders/5p/rmdv_qj931j4vmt4ldbjqz6c0000gn/T/pip-req-build-0eo4xk00
  Running command git clone --filter=blob:none --quiet https://github.com/ValueRaider/yfinance-cache.git /private/var/folders/5p/rmdv_qj931j4vmt4ldbjqz6c0000gn/T/pip-req-build-0eo4xk00
  Running command git checkout -b bug-fixes --track origin/bug-fixes
  Switched to a new branch 'bug-fixes'
  branch 'bug-fixes' set up to track 'origin/bug-fixes'.
  Resolved https://github.com/ValueRaider/yfinance-cache.git to commit 07b8a883fcf6da69e8f5d7f1a149ef1554ecc170
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done

But, when I compare the error trace with the actual code in the repo. It seems, it is not accessing the bug-fixes version, but an old version

ValueRaider commented 10 months ago

TypeError: 'tuple' object does not support item assignment

Good news: I reproduced this, and pushed a fix.

But, when I compare the error trace with the actual code in the repo. It seems, it is not accessing the bug-fixes version

? The stack trace matches latest bug-fixes (excluding this new fix).

kschmid commented 10 months ago

hm, ok. and another variant:

Traceback (most recent call last):
  File "/Users/schmid/Documents/Private-Programming/Private-Trades/Data-downloader/data-download.py", line 161, in <module>
    download_all_data(tickers)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/schmid/Documents/Private-Programming/Private-Trades/Data-downloader/data-download.py", line 146, in download_all_data
    data = download_data(ticker_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/schmid/Documents/Private-Programming/Private-Trades/Data-downloader/data-download.py", line 120, in download_data
    parthist = ticker.history(period="7d", interval="1m")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py", line 236, in history
    h = hist.get(start=None, end=None, period=period, max_age=max_age, trigger_at_market_close=trigger_at_market_close, quiet=quiet)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py", line 501, in get
    self._applyNewEvents()
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py", line 3915, in _applyNewEvents
    self._updatedCachedPrices(self.h)
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py", line 414, in _updatedCachedPrices
    raise Exception(msg)
Exception: _updatedCachedPrices() writing DF with corrupt 'Repaired?' column, investigate
ticker=MSFT
interval=1m DF range 2023-08-07 09:30:00-04:00 >- 2023-08-11 15:59:00-04:00

but at least also the diagnostic messages.

I think, this process is not that effective, we need s.th. that is 100% repeatable for you as well.


Here is a minimal program that seems to be reliably crashing for me with the current version:

import yfinance_cache as yf

ticker = yf.Ticker("msft")  
parthist = ticker.history(period="7d", interval="1m") 

(The error message is basically the same)

ValueRaider commented 10 months ago

I think, this process is not that effective, we need s.th. that is 100% repeatable for you as well.

What might work is this:

As long as we both run code in same market session, then the cached data should be used.

EDIT: Also, add this to the top of code and upload STDOUT: import yfinance as y ; y.enable_debug_mode() Then I can confirm/deny my execution of YFC is requesting the same data from Yahoo.

kschmid commented 10 months ago

I just eliminated the session and everything to ensure that specific details of its configuration do not lead to this behavior. ;-)

So, here is another version and the uploaded cache.

Program:

import yfinance_cache as yf
from requests import Session
from requests_cache import CacheMixin, SQLiteCache

class CachedSession(CacheMixin, Session): pass
session = CachedSession(
    backend=SQLiteCache("test.cache"), # alternative SQLiteCache(use_memory=True)
)

ticker = yf.Ticker("msft", session=session)  
parthist = ticker.history(period="7d", interval="1m") 

test.cache.zip

ValueRaider commented 10 months ago

Didn't reproduce. Post/upload your STDOUT with yfinance.enable_debug_mode(). And confirm your yfinance version: yfinance.__version__

EDIT: And are you sure that cache file is from after execution? I'm not hitting it at all, and after I run code the cache file is 13x bigger. Something smells fishy here ...

kschmid commented 10 months ago

For a moment, I thought, I have it working. - Then, I realized I am accessing yfinance instead of yfinance-cache.

I modified the program to detail it more:

import yfinance as yf
import yfinance_cache as yft
from requests import Session
import requests_cache

print(yf.__version__)
yf.enable_debug_mode()

session = requests_cache.CachedSession('test.cache')
session.headers['User-agent'] = 'my-program/1.0'
ticker = yft.Ticker('msft', session=session)
parthist = ticker.history(period="7d", interval="1m")

prints version and enables debug. Output below:

0.2.28
Traceback (most recent call last):
  File "/Users/schmid/Documents/Private-Programming/Private-Trades/Data-downloader/download-test.py", line 12, in <module>
    parthist = ticker.history(period="7d", interval="1m")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py", line 236, in history
    h = hist.get(start=None, end=None, period=period, max_age=max_age, trigger_at_market_close=trigger_at_market_close, quiet=quiet)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py", line 501, in get
    self._applyNewEvents()
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py", line 3915, in _applyNewEvents
    self._updatedCachedPrices(self.h)
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py", line 414, in _updatedCachedPrices
    raise Exception(msg)
Exception: _updatedCachedPrices() writing DF with corrupt 'Repaired?' column, investigate
ticker=MSFT
interval=1m DF range 2023-08-07 09:30:00-04:00 >- 2023-08-11 15:59:00-04:00

Prior to running it, I deleted the existing cache, uninstalled yfinance and yfinance-cache and reinstalled them (the latter from github - bug-fixes)

ValueRaider commented 10 months ago

I can't help if you ignore my requests. Twice I have asked you to put yfinance.enable_debug_mode() at top of your code (3x now).

I'll play with LLMs and see if I can get it to litter code with debug checks, print exception as soon as data is corrupted ..

kschmid commented 10 months ago

As you see in the code (as I posted), I did introduce "yf.enable_debug_mode()". It didn't do anything I could observe. I am also not exactly not sure, how yfinance and yfinance-cache are intertwined, as the calls are to yfinance-cache. hence, I do not know what should be going on. But I agree. yfinance works for me sufficiently well. yfinance-cache works for you. We are both wasting our time.

I cannot understand why my code doesn't throw problems when you are running it. Things are not working as they should, if I try it. Probably some pecularities of our respective setups.

ValueRaider commented 10 months ago

With yf.enable_debug_mode() you should see verbose output like this:

DEBUG    Entering history()
DEBUG     MSFT: Yahoo GET parameters: {'period1': '2023-08-15 09:30:00-04:00', 'period2': '2023-08-17 11:30:00-04:00', 'interval': '1m', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG     MSFT: yfinance received OHLC data: 2023-08-15 13:30:00 -> 2023-08-16 20:00:00
...
DEBUG     MSFT: yfinance returning OHLC: 2023-08-15 09:30:00-04:00 -> 2023-08-16 15:59:00-04:00
DEBUG    Exiting history()

If you don't see any output, that itself is sign of another problem at your end.

You are not wasting my time, I want to resolve this bug. Try one more thing for me - latest commit will check every single DataFrame modification for corruption, should tell me exactly where it originates.

kschmid commented 10 months ago

yes, this is how it looks like, when I do everything through yfinance.

double checked: yft has no version attribute and also no enable debug mode. Thus effectively, I am switching on debug mode for an imported yf, but then I am using yfc. yf is except for enable debug mode not referenced. It is probably referenced internally by yfc, but in some sense I am not to surprised, it does not turn on debug mode indirectly. Perhaps this is part of the problem, why it does not output for me? But I am not deep enough into the handling of modules in python to judge. Another could be: I ran everything on a mac, s.th. may be different.

For reference, the test I ran:

import yfinance as yf
import yfinance_cache as yfc
from requests import Session
import requests_cache

print(yf.__version__)
yf.enable_debug_mode()

session = requests_cache.CachedSession('test.cache')
session.headers['User-agent'] = 'my-program/1.0'
ticker = yfc.Ticker('msft', session=session)
parthist = ticker.history(period="7d", interval="1m")

python-cache uninstall and reinstalled from github, cache erased. Errors and output forwarded by python3 download-test.py >&out

out-file below

0.2.28
Traceback (most recent call last):
  File "** path removed ** / download-test.py", line 13, in <module>
    parthist = ticker.history(period="7d", interval="1m")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_ticker.py", line 234, in history
    hist = self._histories_manager.GetHistory(interval)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py", line 60, in GetHistory
    self.histories[key] = PriceHistory(self, self.ticker, self.exchange, self.tzName, key, self.session, self.proxy, repair=True, contiguous=False)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py", line 354, in __init__
    self.h = self._getCachedPrices()
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_prices_manager.py", line 379, in _getCachedPrices
    h = yfcu.CustomNanCheckingDataFrame(h)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_utils.py", line 16, in __init__
    self.check_nans()
  File "/usr/local/lib/python3.11/site-packages/yfinance_cache/yfc_utils.py", line 38, in check_nans
    raise Exception(f"NaNs detected in column 'Repaired?'!")
Exception: NaNs detected in column 'Repaired?'! 

Attached the generated cache (was erased before).

With this, I have to end this. If it is not repeatable debugging is just not realistic. Perhaps I check in again in a few months.

kschmid commented 10 months ago

forgot to attach the cache.

test.cache.zip

ValueRaider commented 10 months ago

When I get time I'll dig through that cache, should be able to deduce what requests sent to Yahoo.

The way yf.enable_debug_mode() works is it modifies the global singleton logger object for yfinance, so shouldn't matter where it is called. Just needs to be called before anything else uses yfinance and triggers initialising its logger. So it printing nothing for you is sign of a problem.