caikit / caikit-nlp

Apache License 2.0
12 stars 45 forks source link

Initial bidirectional streaming tokenization on regex sentence splitter #345

Open evaline-ju opened 4 months ago

evaline-ju commented 4 months ago

Is your feature request related to a problem? Please describe.

Current tokenization is only implemented as unary only, and we would like to implement a bidirectional streaming case where chunks of text are aggregated before tokenization/splitting.

Describe the solution you'd like

Implementation on regex sentence splitter