CODAIT / text-extensions-for-pandas

Natural language processing support for Pandas dataframes.
Apache License 2.0
217 stars 34 forks source link

Patch Pandas ExtensionArrayFormatter to handle ndims > 1 #172

Closed BryanCutler closed 3 years ago

BryanCutler commented 3 years ago

Pandas formatting has been failing in TensorArray for datetime64 values and float values with ndims > 1. This patches the ExtensionArrayFormatter to handle higher dimension formatting until a proper fix exists upstream.

Closes #151

BryanCutler commented 3 years ago

@frreiss I'm not crazy about patching like this, but since we can't display TensorArrays with float values of ndim > 1, that is a big limitation. I'm still working on the upstream fix, but waiting on that might prevent us from upgrading Pandas for a while. wdyt?

BryanCutler commented 3 years ago

Still some kinks to work out with text spacing in the tests, I think..

BryanCutler commented 3 years ago

I think I got all the kinks worked out. I had to add some special handling for pandas 1.0.x, but that can all be removed once we bump the minimum version.

BryanCutler commented 3 years ago

@frreiss I added a check to use the original method if values aren't from a TensorArray, added the env var to switch it off completely, and added an upper bound to stop patching at Pandas 1.3,0+. I'm hopeful I'll be able to get a fix in by then :) If not we can just bump that version up. I'm going to go ahead and merge this and we can start using newer pandas versions.