game[s]_to_dataframe failing on pandas >=2 #30

NickBall commented 1 year ago


On pandas 2.0+, chadwick.games_to_dataframe and game_to_dataframe are failing due to Pandas more strictly handling the casting of the dtype specified in the initialization of DataFrames.

This can be reproduced on python 3.8+ and pandas 2.0.2 with the existing unit test:

(venv3.9) nick@astra:~/dev/nickball/forks/pychadwick$ python3 --version
Python 3.9.16
(venv3.9) nick@astra:~/dev/nickball/forks/pychadwick$ pip uninstall -y pandas && make install
(venv3.9) nick@astra:~/dev/nickball/forks/pychadwick$ pip freeze | grep pandas
(venv3.9) nick@astra:~/dev/nickball/forks/pychadwick$ pytest tests/
tests/pychadwick/chadwick/ ..F...                                                                                    [100%]

___________________________________________________________ test_load_games_to_df ____________________________________________________________

chadwick = <pychadwick.chadwick.Chadwick object at 0x7f6f682d06d0>, team_events = ['1982OAK.EVA', '1991BAL.EVA', '1954PHI.EVN']

>   ???

venv3.9/lib/python3.9/site-packages/pychadwick-0.5.0-py3.9-linux-x86_64.egg/pychadwick/ in games_to_dataframe
    dfs = [
venv3.9/lib/python3.9/site-packages/pychadwick-0.5.0-py3.9-linux-x86_64.egg/pychadwick/ in <listcomp>
    pd.DataFrame(list(self.process_game(game_ptr)), dtype="f8")
venv3.9/lib/python3.9/site-packages/pandas/core/ in __init__
    mgr = arrays_to_mgr(
venv3.9/lib/python3.9/site-packages/pandas/core/internals/ in arrays_to_mgr
    arrays, refs = _homogenize(arrays, index, dtype)
venv3.9/lib/python3.9/site-packages/pandas/core/internals/ in _homogenize
    val = sanitize_array(val, index, dtype=dtype, copy=False)
venv3.9/lib/python3.9/site-packages/pandas/core/ in sanitize_array
    subarr = _try_cast(data, dtype, copy)
arr = array(['OAK198204060', 'OAK198204060', 'OAK198204060', 'OAK198204060',
       'OAK198204060', 'OAK198204060', 'OAK1982..., 'OAK198204060', 'OAK198204060', 'OAK198204060',
       'OAK198204060', 'OAK198204060', 'OAK198204060'], dtype=object)
dtype = dtype('float64'), copy = False

    def _try_cast(
        arr: list | np.ndarray,
        dtype: np.dtype,
        copy: bool,
    ) -> ArrayLike:
        Convert input to numpy ndarray and optionally cast to a given dtype.

        arr : ndarray or list
            Excludes: ExtensionArray, Series, Index.
        dtype : np.dtype
        copy : bool
            If False, don't copy the data if not needed.

        np.ndarray or ExtensionArray
        is_ndarray = isinstance(arr, np.ndarray)

        if is_object_dtype(dtype):
            if not is_ndarray:
                subarr = construct_1d_object_array_from_listlike(arr)
                return subarr
            return ensure_wrapped_if_datetimelike(arr).astype(dtype, copy=copy)

        elif dtype.kind == "U":
            # TODO: test cases with arr.dtype.kind in ["m", "M"]
            if is_ndarray:
                arr = cast(np.ndarray, arr)
                shape = arr.shape
                if arr.ndim > 1:
                    arr = arr.ravel()
                shape = (len(arr),)
            return lib.ensure_string_array(arr, convert_na_value=False, copy=copy).reshape(

        elif dtype.kind in ["m", "M"]:
            return maybe_cast_to_datetime(arr, dtype)

        # GH#15832: Check if we are requesting a numeric dtype and
        # that we can convert the data to the requested dtype.
        elif is_integer_dtype(dtype):
            # this will raise if we have e.g. floats

            subarr = maybe_cast_to_integer_array(arr, dtype)
>           subarr = np.array(arr, dtype=dtype, copy=copy)
E           ValueError: could not convert string to float: 'OAK198204060'

venv3.9/lib/python3.9/site-packages/pandas/core/ ValueError
FAILED tests/pychadwick/chadwick/ - ValueError: could not convert string to float: 'OAK198204060'
opening file /tmp/tmp.EVA

On pandas 1.3.5, it does work, but with a deprecation FutureWarning for the initializing of a DataFrame with a non-castable dtype arg:

(venv3.7) nick@astra:~/dev/nickball/forks/pychadwick$ python --version
Python 3.7.16
(venv3.7) nick@astra:~/dev/nickball/forks/pychadwick$ pip freeze | grep pandas
(venv3.7) nick@astra:~/dev/nickball/forks/pychadwick$ pytest tests/
tests/pychadwick/chadwick/ ......                                                                                    [100%]

  /home/nick/dev/nickball/forks/pychadwick/venv3.7/lib/python3.7/site-packages/pychadwick-0.5.0-py3.7-linux-x86_64.egg/pychadwick/ FutureWarning: Could not cast to float64, falling back to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will be cast to that dtype, or a TypeError will be raised
    for game_ptr in games

  /home/nick/dev/nickball/forks/pychadwick/tests/pychadwick/chadwick/ FutureWarning: Could not cast to float64, falling back to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will be cast to that dtype, or a TypeError will be raised
    df = chadwick.game_to_dataframe(next(games))

opening file /tmp/tmp.EVA

Forcing pandas to v1 by using a range in the requirements version spec also fixes this:

(venv3.7) nick@astra:~/dev/nickball/forks/pychadwick$ grep pandas requirements.txt
(venv3.7) nick@astra:~/dev/nickball/forks/pychadwick$ pip freeze | grep pandas
(venv3.7) nick@astra:~/dev/nickball/forks/pychadwick$ pytest tests/ -q


Full test case:

(venv) nick@astra:~/temp/repro$ python3 --version
Python 3.9.16
(venv) nick@astra:~/temp/repro$ pip install pychadwick
Collecting pychadwick
  Using cached pychadwick-0.5.0.tar.gz (119 kB)
Collecting pandas>=1.0.4
  Using cached pandas-2.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.4 MB)
(venv) nick@astra:~/temp/repro$ pip freeze | grep pandas
(venv) nick@astra:~/temp/repro$ python3
>>> from pychadwick.chadwick import Chadwick
>>> chadwick = Chadwick()
>>> file_path = ""
>>> games =
>>> game = next(games)
>>> game
    < object at 0x7f485c7896c0>
>>> chadwick.game_to_dataframe(game)
  File "<stdin>", line 1, in <module>
  File "/home/nick/temp/repro/venv/lib/python3.9/site-packages/pychadwick/", line 259, in game_to_dataframe
    pd.DataFrame(list(self.process_game(game_ptr)), dtype="f8"),
  File "/home/nick/temp/repro/venv/lib/python3.9/site-packages/pandas/core/", line 790, in __init__
    mgr = arrays_to_mgr(
  File "/home/nick/temp/repro/venv/lib/python3.9/site-packages/pandas/core/internals/", line 120, in arrays_to_mgr
    arrays, refs = _homogenize(arrays, index, dtype)
  File "/home/nick/temp/repro/venv/lib/python3.9/site-packages/pandas/core/internals/", line 607, in _homogenize
    val = sanitize_array(val, index, dtype=dtype, copy=False)
  File "/home/nick/temp/repro/venv/lib/python3.9/site-packages/pandas/core/", line 576, in sanitize_array
    subarr = _try_cast(data, dtype, copy)
  File "/home/nick/temp/repro/venv/lib/python3.9/site-packages/pandas/core/", line 765, in _try_cast
    subarr = np.array(arr, dtype=dtype, copy=copy)
ValueError: could not convert string to float: 'OAK198204060'

Builds on 3.8+ can also be enabled with pull to reproduce this.

bdilday commented 1 year ago

closed by