when the chunk size is low, the sentence is being breaked abruptly

Hello @Ajaytherala, thank you for using chunkipy. The behaviour you are facing is expected. As you can see in the documentation, this is expected to happen if the sentence segmenter is unable to split the text into sentences that have less than chunk_size token.

From README.md:

By default, chunkipy uses stanza are main text splitting method; however, if stanza produces sentences with a number of tokens greater than the chunk size, other split strategy are used. Here the list of predefined strategies, sorted by priority (the first one is executed first, if the piece of text is larger than the chunk size, it is further split using a lower priority strategy).

If this behaviour does not suit your needs, you can provide your own split strategies. Closing as this is not a bug.

gioelecrispo / chunkipy

when the chunk size is low, the sentence is being breaked abruptly #5