Open keturn opened 4 years ago
Yeah, it's .replace
method of bytestring which raises this error, and it is confusing for the user. For html-text, having an explicit type check in extract_text
seems like a good usability improvement to me, but with raising TypeError instead of an assert.
I guess that, for this specific line, its whole goal is to convert a string to a bytes object, so parse_html could skip that line if html is already bytes.
That's also possible, but note that this must be a utf8-encoded html, so if it's just a raw response result in a different encoding, then it would not work correctly. Accepting only strings makes sure we don't have this error, and it seems that the time to do re-encoding is small compared to text extraction time. But maybe it's fine to support bytes if the error on non-utf8 html is not too obscure.
The error is shown as "a bytes-like object is required, not
str
", but this is misleading, because the caller's error was that they did pass a bytes object.Honestly not sure what the pythonic way to deal with this is.
assert isinstance
type checking?I guess that, for this specific line, its whole goal is to convert a string to a bytes object, so
parse_html
could skip that line ifhtml
is already bytes.