Open keturn opened 4 years ago
This could be handled either by altering traverse_text_fragments
to get the tag's local name (using etree.QName), or adding a duplicate of each tag to the NEWLINE_TAGS
set that has {http://www.w3.org/1999/xhtml}
prepended.
After the failure of
extract_text
in #24, I triedetree_to_text
.I got through that without encountering an exception, but
guess_layout
doesn't work: no newlines are added after those tags.I think it's because
element.tag
includes the tag's XML namespace, so it doesn't match the namespacelessNEWLINE_TAGS
andDOUBLE_NEWLINE_TAGS
.Test: