extract_text does not work on lxml XHTML element

I guess the docs do explicitly state lxml.html.HtmlElement, but the lxml docs say

Note that XHTML is best parsed as XML, parsing it with the HTML parser can lead to unexpected results.

so I had been using lxml in XML-mode, and it failed with the not-so-obvious error:

…/python3.7/site-packages/html_text/html_text.py in parse_html(html)
     47     XXX: mostly copy-pasted from parsel.selector.create_root_node
     48     """
---> 49     body = html.strip().replace('\x00', '').encode('utf8') or b'<html/>'
     50     parser = lxml.html.HTMLParser(recover=True, encoding='utf8')
     51     root = lxml.etree.fromstring(body, parser=parser)

AttributeError: 'lxml.etree._Element' object has no attribute 'strip'

Test case:

def test_extract_text_from_xml_tree():
    xhtml = (u'<html xmlns="http://www.w3.org/1999/xhtml"><head/><body>'
             '<p>Hello,   World!</p>'
             '</body></html>')

    text = u'Hello, World!'
    assert extract_text(etree.fromstring(xhtml,parser=etree.XMLParser()),
                                         guess_punct_space=False, guess_layout=False) == text

TeamHG-Memex / html-text

extract_text does not work on lxml XHTML element #24