On pandas 2.0+, chadwick.games_to_dataframe and game_to_dataframe are failing due to Pandas more strictly handling the casting of the dtype specified in the initialization of DataFrames.
This can be reproduced on python 3.8+ and pandas 2.0.2 with the existing test_pychadwick.py::test_load_games_to_df unit test:
(venv3.9) nick@astra:~/dev/nickball/forks/pychadwick$ python3 --version
Python 3.9.16
(venv3.9) nick@astra:~/dev/nickball/forks/pychadwick$ pip uninstall -y pandas && make install
(venv3.9) nick@astra:~/dev/nickball/forks/pychadwick$ pip freeze | grep pandas
pandas==2.0.2
(venv3.9) nick@astra:~/dev/nickball/forks/pychadwick$ pytest tests/
============================================================ test session starts =============================================================
platform linux -- Python 3.9.16, pytest-5.4.3, py-1.11.0, pluggy-0.13.1
rootdir: /home/nick/dev/nickball/forks/pychadwick
collected 6 items
tests/pychadwick/chadwick/test_pychadwick.py ..F... [100%]
================================================================== FAILURES ==================================================================
___________________________________________________________ test_load_games_to_df ____________________________________________________________
chadwick = <pychadwick.chadwick.Chadwick object at 0x7f6f682d06d0>, team_events = ['1982OAK.EVA', '1991BAL.EVA', '1954PHI.EVN']
> ???
/home/nick/temp/pychadwick_fork/tests/pychadwick/chadwick/test_pychadwick.py:52:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
venv3.9/lib/python3.9/site-packages/pychadwick-0.5.0-py3.9-linux-x86_64.egg/pychadwick/chadwick.py:247: in games_to_dataframe
dfs = [
venv3.9/lib/python3.9/site-packages/pychadwick-0.5.0-py3.9-linux-x86_64.egg/pychadwick/chadwick.py:248: in <listcomp>
pd.DataFrame(list(self.process_game(game_ptr)), dtype="f8")
venv3.9/lib/python3.9/site-packages/pandas/core/frame.py:790: in __init__
mgr = arrays_to_mgr(
venv3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py:120: in arrays_to_mgr
arrays, refs = _homogenize(arrays, index, dtype)
venv3.9/lib/python3.9/site-packages/pandas/core/internals/construction.py:607: in _homogenize
val = sanitize_array(val, index, dtype=dtype, copy=False)
venv3.9/lib/python3.9/site-packages/pandas/core/construction.py:576: in sanitize_array
subarr = _try_cast(data, dtype, copy)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
arr = array(['OAK198204060', 'OAK198204060', 'OAK198204060', 'OAK198204060',
'OAK198204060', 'OAK198204060', 'OAK1982..., 'OAK198204060', 'OAK198204060', 'OAK198204060',
'OAK198204060', 'OAK198204060', 'OAK198204060'], dtype=object)
dtype = dtype('float64'), copy = False
def _try_cast(
arr: list | np.ndarray,
dtype: np.dtype,
copy: bool,
) -> ArrayLike:
"""
Convert input to numpy ndarray and optionally cast to a given dtype.
Parameters
----------
arr : ndarray or list
Excludes: ExtensionArray, Series, Index.
dtype : np.dtype
copy : bool
If False, don't copy the data if not needed.
Returns
-------
np.ndarray or ExtensionArray
"""
is_ndarray = isinstance(arr, np.ndarray)
if is_object_dtype(dtype):
if not is_ndarray:
subarr = construct_1d_object_array_from_listlike(arr)
return subarr
return ensure_wrapped_if_datetimelike(arr).astype(dtype, copy=copy)
elif dtype.kind == "U":
# TODO: test cases with arr.dtype.kind in ["m", "M"]
if is_ndarray:
arr = cast(np.ndarray, arr)
shape = arr.shape
if arr.ndim > 1:
arr = arr.ravel()
else:
shape = (len(arr),)
return lib.ensure_string_array(arr, convert_na_value=False, copy=copy).reshape(
shape
)
elif dtype.kind in ["m", "M"]:
return maybe_cast_to_datetime(arr, dtype)
# GH#15832: Check if we are requesting a numeric dtype and
# that we can convert the data to the requested dtype.
elif is_integer_dtype(dtype):
# this will raise if we have e.g. floats
subarr = maybe_cast_to_integer_array(arr, dtype)
else:
> subarr = np.array(arr, dtype=dtype, copy=copy)
E ValueError: could not convert string to float: 'OAK198204060'
venv3.9/lib/python3.9/site-packages/pandas/core/construction.py:765: ValueError
========================================================== short test summary info ===========================================================
FAILED tests/pychadwick/chadwick/test_pychadwick.py::test_load_games_to_df - ValueError: could not convert string to float: 'OAK198204060'
======================================================== 1 failed, 5 passed in 1.75s =========================================================
"F",0,"","F","F","",0,0,"N",0,"N",0,"N",1,0,0,0,"","","","","F","F","F","F","F","F","F","F","F","","","","F","F","F","F","F","","","","",0,0,0,0,0,0,0,0,0,"PHI","PHI","NY1",1,"T","F",2,3,0,47,0,"T","F",0,1,"F","F","hamng102","ennid101","F","F",0,0,0,0,0,0,0,0,0,"","","",0,0,0,0,0,0,0,0,0,0,0,0,0,"","F","F","F","F",0,0,0,0,0,0,0,0,0,0,F,F
"PHI195409260","NY1",11,1,0,0,0,"",3,2,"hamng102","?","hamng102","?","speng102","?","speng102","?","garaj101","lockw101","willd102","amalj101","gardb101","rhodd101","maysw101","mueld101","burgs101","","","64(1)3/GDP","F","F",4,4,2,"T","T",0,"F","F",2,"T","F",0,"F","F",6,"G","F","F","",0,0,"N",0,"N",0,"N",0,0,0,0,"43","64","","","F","F","F","F","F","F","F","F","F","speng102","","","F","F","F","F","F","","","","",0,4,3,0,6,4,0,0,0,"PHI","PHI","NY1",1,"F","F",2,3,0,48,1,"T","F",1,0,"T","T","ennid101","morgb102","F","F",2,3,99,0,0,0,0,0,0,"garaj101","","",0,0,0,0,0,0,0,0,0,0,0,0,0,"gardb101","T","F","F","F",0,0,0,0,0,0,0,0,0,0,F,F
"PHI195409260","NY1",11,1,2,0,0,"",3,2,"ennid101","?","ennid101","?","speng102","?","speng102","?","garaj101","lockw101","willd102","amalj101","gardb101","rhodd101","maysw101","mueld101","","","","5/FL","F","F",9,5,2,"T","T",0,"F","F",1,"F","F",0,"F","F",5,"F","F","T","",0,0,"N",0,"N",0,"N",0,0,0,0,"5","","","","F","F","F","F","F","F","F","F","F","","","","F","T","F","F","F","","","","",0,5,0,0,0,0,0,0,0,"PHI","PHI","NY1",1,"F","T",2,3,0,49,2,"T","F",0,0,"T","T","morgb102","jonew101","F","F",0,0,0,0,0,0,0,0,0,"","","",0,0,0,0,0,0,0,0,0,0,0,0,0,"amalj101","F","F","F","F",0,0,0,0,0,0,0,0,0,0,F,F
opening file /tmp/tmp.EVA
On pandas 1.3.5, it does work, but with a deprecation FutureWarning for the initializing of a DataFrame with a non-castable dtype arg:
(venv3.7) nick@astra:~/dev/nickball/forks/pychadwick$ python --version
Python 3.7.16
(venv3.7) nick@astra:~/dev/nickball/forks/pychadwick$ pip freeze | grep pandas
pandas==1.3.5
(venv3.7) nick@astra:~/dev/nickball/forks/pychadwick$ pytest tests/
============================================================ test session starts =============================================================
platform linux -- Python 3.7.16, pytest-5.4.3, py-1.11.0, pluggy-0.13.1
rootdir: /home/nick/dev/nickball/forks/pychadwick
collected 6 items
tests/pychadwick/chadwick/test_pychadwick.py ...... [100%]
============================================================== warnings summary ==============================================================
tests/pychadwick/chadwick/test_pychadwick.py::test_load_games_to_df
/home/nick/dev/nickball/forks/pychadwick/venv3.7/lib/python3.7/site-packages/pychadwick-0.5.0-py3.7-linux-x86_64.egg/pychadwick/chadwick.py:249: FutureWarning: Could not cast to float64, falling back to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will be cast to that dtype, or a TypeError will be raised
for game_ptr in games
tests/pychadwick/chadwick/test_pychadwick.py::test_load_games_to_df
/home/nick/dev/nickball/forks/pychadwick/tests/pychadwick/chadwick/test_pychadwick.py:55: FutureWarning: Could not cast to float64, falling back to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will be cast to that dtype, or a TypeError will be raised
df = chadwick.game_to_dataframe(next(games))
-- Docs: https://docs.pytest.org/en/latest/warnings.html
======================================================= 6 passed, 2 warnings in 5.47s ========================================================
"F",0,"","F","F","",0,0,"N",0,"N",0,"N",1,0,0,0,"","","","","F","F","F","F","F","F","F","F","F","","","","F","F","F","F","F","","","","",0,0,0,0,0,0,0,0,0,"PHI","PHI","NY1",1,"T","F",2,3,0,47,0,"T","F",0,1,"F","F","hamng102","ennid101","F","F",0,0,0,0,0,0,0,0,0,"","","",0,0,0,0,0,0,0,0,0,0,0,0,0,"","F","F","F","F",0,0,0,0,0,0,0,0,0,0,F,F
"PHI195409260","NY1",11,1,0,0,0,"",3,2,"hamng102","?","hamng102","?","speng102","?","speng102","?","garaj101","lockw101","willd102","amalj101","gardb101","rhodd101","maysw101","mueld101","burgs101","","","64(1)3/GDP","F","F",4,4,2,"T","T",0,"F","F",2,"T","F",0,"F","F",6,"G","F","F","",0,0,"N",0,"N",0,"N",0,0,0,0,"43","64","","","F","F","F","F","F","F","F","F","F","speng102","","","F","F","F","F","F","","","","",0,4,3,0,6,4,0,0,0,"PHI","PHI","NY1",1,"F","F",2,3,0,48,1,"T","F",1,0,"T","T","ennid101","morgb102","F","F",2,3,99,0,0,0,0,0,0,"garaj101","","",0,0,0,0,0,0,0,0,0,0,0,0,0,"gardb101","T","F","F","F",0,0,0,0,0,0,0,0,0,0,F,F
"PHI195409260","NY1",11,1,2,0,0,"",3,2,"ennid101","?","ennid101","?","speng102","?","speng102","?","garaj101","lockw101","willd102","amalj101","gardb101","rhodd101","maysw101","mueld101","","","","5/FL","F","F",9,5,2,"T","T",0,"F","F",1,"F","F",0,"F","F",5,"F","F","T","",0,0,"N",0,"N",0,"N",0,0,0,0,"5","","","","F","F","F","F","F","F","F","F","F","","","","F","T","F","F","F","","","","",0,5,0,0,0,0,0,0,0,"PHI","PHI","NY1",1,"F","T",2,3,0,49,2,"T","F",0,0,"T","T","morgb102","jonew101","F","F",0,0,0,0,0,0,0,0,0,"","","",0,0,0,0,0,0,0,0,0,0,0,0,0,"amalj101","F","F","F","F",0,0,0,0,0,0,0,0,0,0,F,F
opening file /tmp/tmp.EVA
Forcing pandas to v1 by using a range in the requirements version spec also fixes this:
Description
On pandas 2.0+,
chadwick.games_to_dataframe
andgame_to_dataframe
are failing due to Pandas more strictly handling the casting of the dtype specified in the initialization of DataFrames.This can be reproduced on python 3.8+ and pandas 2.0.2 with the existing
test_pychadwick.py::test_load_games_to_df
unit test:On pandas 1.3.5, it does work, but with a deprecation FutureWarning for the initializing of a DataFrame with a non-castable dtype arg:
Forcing pandas to v1 by using a range in the requirements version spec also fixes this:
Reproduce
make install
from this repo, or install pychadwickmake test
or otherwise call chadwick.game_to_dataframe. pandas will raise a ValueError.Full test case:
Builds on 3.8+ can also be enabled with pull https://github.com/bdilday/pychadwick/pull/29 to reproduce this.