It could be more elegant if it didn't have to use the dictionary to transform something like ç to an actual c-cedille, but if the parser used the available DTD file that comes with the XML file. It might work with sth like:
from xml.sax.saxutils import unescape
unescape(“< & >“)
# returns ‘< & >’
It could be more elegant if it didn't have to use the dictionary to transform something like ç to an actual c-cedille, but if the parser used the available DTD file that comes with the XML file. It might work with sth like:
Or maybe with lxml library: https://lxml.de/validation.html#id1
Or with BeautifulSoup:
from bs4.dammit import EntitySubstitution, EntitySubstitution.substitute_html
Tutorial: http://www2.hawaii.edu/~takebaya/cent110/xml_parse/xml_parse.html More info: https://stackoverflow.com/questions/29799542/how-to-retain-quot-and-apos-while-parsing-xml-using-bs4-python