I'm occasionally getting BlockifyError s caused by malformed encoding values set here. Here's the tail of the traceback:
Traceback (most recent call last):
File "dragnet/blocks.pyx", line 846, in dragnet.blocks.Blockifier.blockify
File "src/lxml/parser.pxi", line 1689, in lxml.etree.HTMLParser.__init__
File "src/lxml/parser.pxi", line 823, in lxml.etree._BaseParser.__init__
LookupError: unknown encoding: 'b'UTF-8,''
Looks like there's a trailing comma on "UTF-8", plus it's been incorrectly converted into unicode — possibly by calling str(b"UTF-8") instead of b"UTF-8".decode("utf-8").
I wasn't able to track down a relevant bug in blocks.pyx, so maybe this is just messy web data and 🤷♂ . Just posting in case somebody knows what's up!
Huh, that's pretty odd. Do you have an example page you can share that causes this? At a glance, I don't see anything that would cause it, but I'd be curious to poke around and see what's up with that.
I'm occasionally getting
BlockifyError
s caused by malformed encoding values set here. Here's the tail of the traceback:Looks like there's a trailing comma on "UTF-8", plus it's been incorrectly converted into unicode — possibly by calling
str(b"UTF-8")
instead ofb"UTF-8".decode("utf-8")
.I wasn't able to track down a relevant bug in
blocks.pyx
, so maybe this is just messy web data and 🤷♂ . Just posting in case somebody knows what's up!