benbrandt / text-splitter

Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
MIT License
212 stars 14 forks source link

Non-greedy split by semantic level #237

Open benbrandt opened 3 weeks ago

benbrandt commented 3 weeks ago

Discussed in https://github.com/benbrandt/text-splitter/discussions/226

Originally posted by **noau** June 12, 2024 Thanks for your great work! I want to know that if it's possible to just split strings on a given semantic level instead of splitting greedy and only stops when the chunk exceeds some given size limits. For example, the two sentences above would be splitted into just 1. "Thanks for your great work!" 2. "I want to know that if it's possible to just split strings on a given semantic level instead of splitting greedily and stops only when the chunk exceeds some given size limits." on a sentence level, ignoring the size limits.