Currently the sentence splitting works of a basic method
Checks for a few things like commas, full stops, etc
Unfortunately this split's up sentences that shouldn't be split up. E.g. if it was a coding interview and a ! was used or a . was used, then it would split up the code
Instead a better method is to only check when it's at the end of a sentence. E.g. the code below taken from here
async def text_chunker(chunks):
"""Split text into chunks, ensuring to not break sentences."""
splitters = (".", ",", "?", "!", ";", ":", "—", "-", "(", ")", "[", "]", "}", " ")
buffer = ""
async for text in chunks:
if buffer.endswith(splitters):
yield buffer + " "
buffer = text
elif text.startswith(splitters):
yield buffer + text[0] + " "
buffer = text[1:]
else:
buffer += text
if buffer:
yield buffer + " "