Closed gilbsgilbs closed 6 years ago
I think mentioning LookupError
in documentation is just fine.
Fallback to UTF-8 is not reliable: it produces a garbage or even exception most likely.
Would you make a Pull Request?
@asvetlov I agree, but that seems inconsistent with the current behavior. If the client passes charset=<some unknown encoding>
in content type and cchardet can't detect an encoding, get_encoding
will fallback to utf-8
. It shouldn't.
Long story short
Some encoding detected by cchardet are unsupported by python (e.g.
VISCII
). This makestext()
function raise aLookupError
when such encoding is detected, even whenerrors
parameter is set to'ignore'
(which I would have assumed to be safe).Expected behaviour
Not sure about this. Maybe
get_encoding()
shouldcodecs.lookup
for the detected encoding to ensure it is known, and if it isn't, fallback to UTF-8. Or document properly thattext()
function might raiseLookupError
or thatget_encoding()
result is not safe to pass totext(encoding)
or.decode(encoding)
directly.Actual behaviour
LookupError
is thrown.Steps to reproduce
Your environment