benbrandt / text-splitter

Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
MIT License
259 stars 16 forks source link

Optimization for SemanticSplitRange searching #219

Closed benbrandt closed 3 months ago

benbrandt commented 3 months ago

Leverages the fact that these ranges are guaranteed to be sorted now. Rather than doing a retain, which likely moves items in the vec and also has to iterate over all values, instead there is a cursor to extract the desired slice, and we can move the cursor to the first item that would still be in the range we desire.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.65%. Comparing base (ac9f17a) to head (d99387e).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #219 +/- ## ======================================= Coverage 99.65% 99.65% ======================================= Files 11 11 Lines 2036 2047 +11 ======================================= + Hits 2029 2040 +11 Misses 7 7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.