Closed dentarg closed 8 years ago
From view-source:http://www.dn.se/nyheter/varlden/pensionarer-planerade-stora-juvelstoten-pa-puben/
<title>
Pensionärer planerade stora juvelstöten på puben - DN.SE
</title>
what happens in the current solution:
>>> print unichr(228).encode('ascii', 'replace')
?
what we want to happen (utf-8 instead of ascii needed to handle 228)
>>> print unichr(228).encode('utf-8', 'replace')
ä
http://stackoverflow.com/questions/2087370/decode-html-entities-in-python-string
>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> print h.unescape('Pensionärer planerade stora juvelstöten på puben - DN.SE')
Pensionärer planerade stora juvelstöten på puben - DN.SE