Closed BryanCutler closed 3 years ago
@frreiss this seems like a good improvement for SpanArray serialization - much better to store in a dictionary batch rather than field metadata. If this looks ok, I'll get started on TokenSpanArray.
I think I addressed all and tests are passing. I'll go ahead and merge now and fix up anything with a followup or when I fix TokenSpanArray arrow conversion.
This changes Arrow serialization for SpanArray to store documents in a dictionary that is indexed by text ids. Also added support for saving to Parquet files.
From #179