Closed GoogleCodeExporter closed 9 years ago
Not a bug, the string is not valid JSON if it contains this character unescaped.
>>> simplejson.loads(u'"\\u000b"')
u'\x0b'
>>> simplejson.dumps(u'\x0b')
'"\\u000b"'
Original comment by bob.ippo...@gmail.com
on 24 Feb 2011 at 11:52
OK. Do you have a workaround for this? I just ran into another instance of this
(with \xe2). What are all the characters that don't work and what should I
replace them with?
Original comment by yanghate...@gmail.com
on 25 Feb 2011 at 12:50
I think you are confused about how JSON and/or unicode works, I'm not sure
which and I don't know exactly how to help you.
>>> simplejson.loads(u'"\\u00e2"')
u'\xe2'
>>> simplejson.dumps(u'\xe2')
'"\\u00e2"'
Original comment by bob.ippo...@gmail.com
on 25 Feb 2011 at 1:03
Bob, you're right in that I'm confused, and I think it's about how JSON works.
First, I think something went wrong when I tried pasting the original string in
my first post, since it's not even showing the \x0b. That should have been:
In [7]: open('aoeu').read()
Out[7]: '"\\u003Cp\\u003EPeopleBrowsr is a data mining, analytics and brand
engagement service provider for enterprise brand managers, social media
strategists, hedge fund managers, advertising agencies and IT
developers.\\n\x0b\\nFounded in 2006 by..."\n'
In [8]: Out[7].decode('utf8')
Out[8]: u'"\\u003Cp\\u003EPeopleBrowsr is a data mining, analytics and brand
engagement service provider for enterprise brand managers, social media
strategists, hedge fund managers, advertising agencies and IT
developers.\\n\x0b\\nFounded in 2006 by..."\n'
In [9]: simplejson.loads(Out[7])
[...error...]
I'm dealing with a data source that is giving me strings like this one, whether
I like it or not. So I'm really just asking how I should munge that string into
a form that simplejson won't choke on. I thought it might be helpful to ask
here in case others who come by here have the same question.
(Also, please disregard my comment about \xe2 - that was actually something
else.)
Original comment by yanghate...@gmail.com
on 25 Feb 2011 at 1:22
Okay, so the JSON you have is actually not valid JSON. You can parse it with
strict=False.
>>> import simplejson
>>> s = '"\\u003Cp\\u003EPeopleBrowsr is a data mining, analytics and brand
engagement service provider for enterprise brand managers, social media
strategists, hedge fund managers, advertising agencies and IT
developers.\\n\x0b\\nFounded in 2006 by..."\n'
>>> simplejson.loads(s, strict=False)
'<p>PeopleBrowsr is a data mining, analytics and brand engagement service
provider for enterprise brand managers, social media strategists, hedge fund
managers, advertising agencies and IT developers.\n\x0b\nFounded in 2006 by...'
Original comment by bob.ippo...@gmail.com
on 25 Feb 2011 at 1:35
Thank you. I wasn't aware of that flag, and it made all my error-avoidance code
go away.
Original comment by yanghate...@gmail.com
on 25 Feb 2011 at 1:53
Thanks strict=False save my day ^^
Original comment by adesanto...@gmail.com
on 13 Dec 2012 at 10:29
Original issue reported on code.google.com by
yanghate...@gmail.com
on 24 Feb 2011 at 11:43