CODAIT / text-extensions-for-pandas

Natural language processing support for Pandas dataframes.
Apache License 2.0
215 stars 34 forks source link

Bug in Pandas 1.3.0 block manager breaking TensorArray slicing operations #220

Open frreiss opened 2 years ago

frreiss commented 2 years ago

Due to some performance optimizations introduced in pandas-dev/pandas#40353, Pandas turns DataFrame.iloc[slice(x, y, z)] into __getitem__((..., slice(x, y, z)) on the ExtensionArray that backs any column defined with an extension type. This bug is breaking indexing on TensorArrays. TensorArray.__getitem__() receives the multidimensional slice spec and returns a 2D slice of the underlying NumPy array instead of the 1D slice that the caller expects to get.

frreiss commented 2 years ago

Opened https://github.com/pandas-dev/pandas/issues/42430 to cover the root cause issue.