CODAIT / text-extensions-for-pandas

Natural language processing support for Pandas dataframes.
Apache License 2.0
215 stars 34 forks source link

TensorArray fails when used as boolean mask index #162

Open BryanCutler opened 3 years ago

BryanCutler commented 3 years ago

Pandas can not recognize an extension array that when taken as numpy, it is a 1-D boolean array and use that array as a boolean mask for indexing.

arr = tp.TensorArray(np.arange(20).reshape(10,2))
s = pd.Series(arr)
thresh = s > 8
s[np.all(thresh.array, axis=1)]

results in: KeyError: "None of [Index([False, False, False, False, False, True, True, True, True, True], dtype='object')] are in the [index]" or other strange errors because it is not picked up as 1-D boolean array and tries to be a list-like indexer or something else

BryanCutler commented 3 years ago

In the notebook Text_Extenstions_for_Pandas_Overview an example shows a TensorArray used as a boolean mask:

s[np.all(thresh.array, axis=1)]

This is now failing when the Series tries to validate the mask. Need to find a fix or another way to do this.

This has been resolved in the notebook with a workaround. I wanted to leave this open because Pandas should be able to recognize an extension array that converts to a 1-d bool array and use that as a boolean index.

BryanCutler commented 3 years ago

Fixed up the issue to better describe the required functionality from Pandas