Open uglyrobot opened 1 year ago
Yikes! Yeah, we probably need to talk about sizes and limits for the nakedlibrary
. Or maybe nakedlibrary
import needs to use chunker
?
As a way out of this particular situation, I would highly recommend running the content through chunker
first to get the right-sized chunks.
The example usage is here: https://github.com/dglazkov/polymath/blob/main/convert/markdown.py#L59
Hmmm nakedlibrary importer does run the text through generate_chunks
.
@uglyrobot I'm guessing that one of your chunks of text as a single line that is extraordinarily long? Can you confirm?
@dglazkov that implies to me that generate_chunks
should forcibly break up content that is very long into multiple chunks, perhaps breaking at sentence boundaries first and then failing that at a word boundary and failing that just hard breaking it in the middle of a run of characters?
Yes I broke it up on newlines. It was a long one but more importantly that didn't fail gracefully.
I'd love to see the input. Would you be up for sharing?
I'm getting this on a convert of nakedlibrary (split on new lines)