Closed mikesafar closed 5 years ago
What is the definition of strip_whitespace
? Document didn't say. Here is what it does now, for example:
julia> doc = Document(" this is sample text. Also a simple text. ")
julia> prepare!(doc, strip_whitespace)
julia> doc.text
"this is sample text. Also a simple text. "
A simple definition would be that it replaces any occurrences of multiple bytes of value 0x20
with a single one. The whitespace at the end of the string would not qualify as 'strippable'.
I think the defination of strip which space should be after it is run there wille be
prepare!(doc, strip_whitespace) does not trim the whitespace from the end of the text. In my view it's not just multi-whitespace characters, but also whitespace characters at the end that need to be trimmed out.
It's a small issue, but that assumption on my part led a big cascade of bugs downstream! Easily fixed with a replace(text, r"(^\s+)|(\s+$)", "") I know, but still....