Open keturn opened 4 years ago
@keturn right, good catch - this is something we should fix. In the meantime, you can try calling html_text.etree_to_text
directly, that won't fail in parse_html
(but may fail later as I didn't check it). EDIT as I see you already tried that in #25.
Also I didn't experience issues with parsing XHTML with HTML parser, at least as far as html-text is concerned.
I guess the docs do explicitly state
lxml.html.HtmlElement
, but the lxml docs sayso I had been using lxml in XML-mode, and it failed with the not-so-obvious error:
Test case: