This pull request primarily focuses on renaming the max_chunk_size parameter to chunk_size across various files and methods, and optimizing the sentence splitting method for better performance and accuracy. Below are the most important changes:
Parameter Renaming:
DOCS.md: Renamed max_chunk_size to chunk_size in the documentation for SemanticChunker and SDPMChunker classes. [1][2][3][4]
src/chonkie/chunker/semantic.py and src/chonkie/chunker/sentence.py: Replaced the regex-based sentence splitting method with a faster and more accurate method using delimiters and separators. [1][2][3]
These changes aim to improve the code's readability and performance, particularly in handling large text chunks and splitting sentences efficiently.
This pull request primarily focuses on renaming the
max_chunk_size
parameter tochunk_size
across various files and methods, and optimizing the sentence splitting method for better performance and accuracy. Below are the most important changes:Parameter Renaming:
DOCS.md
: Renamedmax_chunk_size
tochunk_size
in the documentation forSemanticChunker
andSDPMChunker
classes. [1] [2] [3] [4]src/chonkie/chunker/sdpm.py
: Updatedmax_chunk_size
tochunk_size
in theSDPMChunker
class and its methods. [1] [2] [3] [4]src/chonkie/chunker/semantic.py
: Changedmax_chunk_size
tochunk_size
in theSemanticChunker
class and its methods. [1] [2] [3] [4] [5] [6] [7] [8]Sentence Splitting Optimization:
src/chonkie/chunker/semantic.py
andsrc/chonkie/chunker/sentence.py
: Replaced the regex-based sentence splitting method with a faster and more accurate method using delimiters and separators. [1] [2] [3]These changes aim to improve the code's readability and performance, particularly in handling large text chunks and splitting sentences efficiently.