Closed frreiss closed 2 years ago
Okay, that didn't work. Will try something else.
Fixed some bugs in SpanArray
and TokenSpanArray
that new regression tests in Pandas 1.3.0 brought to light. Now we're down to 62 failing tests.
Update: Most of the test failures seem to be due to a bug in Pandas 1.3.0. Due to some performance optimizations introduced in https://github.com/pandas-dev/pandas/pull/40353, Pandas turns DataFrame.iloc[slice(x, y, z)]
into __getitem__((..., slice(x, y, z))
on the ExtensionArray that backs any column defined with an extension type.
Code to reproduce:
from pandas.api.extensions import ExtensionArray,ExtensionDtype
class MyExtensionDtype(ExtensionDtype):
"""Minimal extension dtype"""
def __init__(self):
pass
@property
def type(self):
return int
@property
def name(self) -> str:
return "MyExtensionDtype"
@classmethod
def construct_array_type(cls):
return MyExtensionArray()
class MyExtensionArray(ExtensionArray, ExtensionScalarOpsMixin):
"""Minimal extension array that logs calls to __getitem__()"""
@property
def dtype(self):
return MyExtensionDtype()
def copy(self):
return MyExtensionArray()
def __len__(self):
return 5
def __getitem__(self, key):
print(f"__getitem__ called with key '{key}'")
return 42
arr = MyExtensionArray()
df = pd.DataFrame({"a": arr})
_ = df.iloc[:3]
which prints out:
__getitem__ called with key '(Ellipsis, slice(None, 3, None))'
It should print the following instead:
__getitem__ called with key 'slice(None, 3, None)'
I'll put in a workaround tomorrow and file a bug with Pandas.
FYI @BryanCutler @ZachEichen @PokkeFe @Crushellini
Update:
TensorArray.__getitem__()
Update: Fixed another minor bug. Now we are down to 49 failing tests.
Update:
TensorArray.isna()
that was causing a number of regressions.Now we're down to 2 failing tests.
All tests passing against Pandas 1.3.0 now. Merging this PR to unblock other PRs.
Pandas 1.3.x renamed its abstract base class for indexes from
ABCIndexClass
toABCIndex
, which is messing up some of our type checks. This PR adds some logic to work around that renaming. I'm using exceptions as control flow, which is a bit ugly, but the alternatives are also ugly.