Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
Originally posted by **noau** June 12, 2024
Thanks for your great work! I want to know that if it's possible to just split strings on a given semantic level instead of splitting greedy and only stops when the chunk exceeds some given size limits. For example, the two sentences above would be splitted into just
1. "Thanks for your great work!"
2. "I want to know that if it's possible to just split strings on a given semantic level instead of splitting greedily and stops only when the chunk exceeds some given size limits."
on a sentence level, ignoring the size limits.
Discussed in https://github.com/benbrandt/text-splitter/discussions/226