Open arshaw opened 13 years ago
Reported by uglydog....@gmail.com, Oct 29, 2010
What steps will reproduce the problem?
m.group(0) == '#x201C'
_substitute_entity()
unichr(int(ent)) (where ent=='x201C')
What is the expected output? What do you see instead? unichr() wants integer 0x201C.
What version of the product are you using? On what operating system? scrapemark-0.9-py2.5.egg Python 2.6.4 Ubuntu 9.10 x64
Please provide any additional information below.
adding this function:
def my_int(s): try: return int(s) except: pass try: return int(s, 16) except: pass if len(s)>0 and s[0].lower() == 'x': try: return int('0'+s, 16) except: pass return 0 and substitute: unichr(int(ent)) with unichr(my_int(ent))
seems to fix the problem.
Probably fixed in #9.
Reported by uglydog....@gmail.com, Oct 29, 2010
What steps will reproduce the problem?
m.group(0) == '#x201C'
in_substitute_entity()
.unichr(int(ent)) (where ent=='x201C')
throws ValueError.What is the expected output? What do you see instead? unichr() wants integer 0x201C.
What version of the product are you using? On what operating system? scrapemark-0.9-py2.5.egg Python 2.6.4 Ubuntu 9.10 x64
Please provide any additional information below.
adding this function:
seems to fix the problem.