CODAIT / text-extensions-for-pandas

Natural language processing support for Pandas dataframes.
Apache License 2.0
215 stars 34 forks source link

Raise error for TokenSpan arrow conversion with pyarrow < 2 #183

Closed BryanCutler closed 3 years ago

BryanCutler commented 3 years ago

Adds raising of error for pyarrow < 2 due to unsupported use of nested dictionary arrays.

This also adds support for TokenSpan array containing null values.

Currently disable TokenSpan for multi-doc due to unsupported pyarrow.concat with dictionary unification.

Closes #179

BryanCutler commented 3 years ago

@frreiss Unfortunately I ran into unimplemented errors when trying mulit-doc and parquet saving. Otherwise, I think this is working better and I added some more tests with TokenSpan arrays containing null values.