CODAIT / text-extensions-for-pandas

Natural language processing support for Pandas dataframes.
Apache License 2.0
215 stars 34 forks source link

Add "first" reduction type and NA value to spans #167

Closed frreiss closed 3 years ago

frreiss commented 3 years ago

This PR adds a "first" reduction (aka aggregate) to SpanArray and TokenSpanArray. I've also implemented the na_value() hook on the associated dtype classes, so that empty groups in a groupby expression will get a reasonable value.

There is still some work to be done in handling the case where someone tries to roll up a list of Span objects, some of which are null values with null target text and some of which aren't, into a single SpanArray. I think that problem is best handled in a follow-on PR.