Closed luoqishuai closed 3 years ago
tostring_ gets HTML and text inside the elements.
compat/init.py:
from lxml.etree import tostring
def tostring_(s):
return tostring(s, encoding='utf-8')
I run
tostring(node_list[0])
output
b'<div>a</div>'
It looks like tostring(node) also contains node's tag
ok thanks i'll fix that. This was supposed to replace div to p if they contain only text but no tags inside.
In readability transform_misused_divs_into_paragraphs
Because elem always has "div", re.search will never take effect
demo
output
Please let me know if I get it wrong