Closed GoogleCodeExporter closed 9 years ago
This is an optimization. If given a str object as input, then it will give str
strings as output if and only if the
string is ASCII-only. ASCII-only strings are interchangable with unicode. If
you give it unicode input then
you'll get unicode output strings regardless. This optimization is not new in
2.0.7.
>>> simplejson.loads('"foo"')
'foo'
>>> simplejson.loads(u'"foo"')
u'foo'
dumps always returns an ASCII-only string by default, so that's why
loads(dumps(unistr)) can give you ASCII
strings. You'd want to do loads(unicode(dumps(unistr))) if you want to get
unicode strings back out.
Original comment by bob.ippo...@gmail.com
on 12 Feb 2009 at 5:19
Bob, I know you've now refused to fix this in several situations now (such as:
http://www.nabble.com/simplejson-2.0.0-released,-much-faster.-td19705153.html),
and I
can actually name you a place where I think it causes issues.
In Sqlalchemy, the "Unicode" type
(http://www.sqlalchemy.org/docs/05/reference/sqlalchemy/types.html#sqlalchemy.ty
pes.Unicode),
warns when you insert str() objects.
My work flow: create some complicated thing, serialize it to json, which gets
used
by many other different workflow processes. When I read it back in, I'd really
like
every string in the thing to come back in as unicode type, if possible.
Thanks!
Original comment by gregg.l...@gmail.com
on 19 May 2009 at 4:19
Oh, I see that in issue 28, someone mentioned this exact issue, and you bdfl'd
it
there too! I guess I'll deal with it on my own then!
Original comment by gregg.l...@gmail.com
on 19 May 2009 at 4:21
If you want unicode strings, use a unicode input document.
Original comment by bob.ippo...@gmail.com
on 19 May 2009 at 4:46
I have personally wasted hours on this. I can't afford to track down subtle
bugs that depend on what version
of simplejson someone has installed and whether the speedups are present, so
nowadays I only use it through
the following wrapper module.
Eliminating the need for this wrapper is one of the benefits I have hoped to
reap by dropping support for
Python 2.5 someday. I just hope the issue doesn't recur in Python 2.x's
built-in json module.
try:
import json # Python 2.6
except ImportError:
import simplejson as json # Python 2.5
dumps = json.dumps
def loads(s, *args, **kwargs):
# When its argument is of type str, loads() decodes strings as
# either str or unicode depending on whether simplejson's speedups
# are installed (at least this is true in simplejson 2.0.7). It
# always decodes strings as unicode when the argument to loads()
# is of type unicode.
return json.loads(unicode(s), *args, **kwargs)
Original comment by ken.ri...@gmail.com
on 19 May 2009 at 8:01
It is the same in Python 2.7 trunk. If you want unicode even for ASCII strings,
use unicode input.
Original comment by bob.ippo...@gmail.com
on 19 May 2009 at 8:07
This cost me several hours as well. Decoding external input into unicode seems
like
something that should happen at a program's data boundaries - which is where I
suspect the simplejson/json module is frequently used. As such, the principle
of
least astonishment suggests to me that I should be getting unicode back. I
don't
know about other users, but the speed optimization isn't that valuable to me at
the
moment - maybe some kind of 'output_ascii' keyword, for people who need the
speed
enhancement, for loads would be a better solution?
Original comment by markhuet...@gmail.com
on 24 May 2009 at 2:04
Believe it or not, some applications still require ascii and don't play well
with unicode. For an application I have to work with every day, this is a
feature, not a bug. I'm voting in order to be notified if this ever gets
"fixed"...
Original comment by bradalle...@gmail.com
on 28 Mar 2012 at 8:34
The issue tracker for simplejson is here:
https://github.com/simplejson/simplejson/issues
Original comment by b...@launchcommander.com
on 28 Mar 2012 at 9:23
This is crazy - a full day of 2 developers down the drain!
>>> import simplejson as json
>>> dump = json.dumps((u"$123", u"₪123"))
>>> [type(object) for object in json.loads(dump)]
[<type 'str'>, <type 'unicode'>] # This is bad!
vs.
>>> import json
>>> dump = json.dumps((u"$123", u"₪123"))
>>> [type(object) for object in json.loads(dump)]
[<type 'unicode'>, <type 'unicode'>] # This is good!
Original comment by major....@gmail.com
on 21 Apr 2013 at 10:48
The pure python version of simplejson gives different type than the c speedups
version. I ran into this when installing in virtual env without python-dev.
You can demo the problem on the version installed with speedups by using
_toggle_speedups to go back to pure version.
>>> import simplejson as json
>>> json.loads('"foo"')
'foo'
>>> json._toggle_speedups(False)
>>> json.loads('"foo"')
u'foo'
This needs to be fixed one way or the other.
Original comment by tom2...@gmail.com
on 18 Oct 2013 at 1:36
Hm, for me, both libraries do it 'wrong'-ish:
json returns <type 'unicode'> even for "$123", withOUT the 'u' that renders it
unicode.
simplejson returns <type 'str'> when the input is u"$123"? What's the reason
for this inconsistency?
Original comment by kmichael...@gmail.com
on 8 Sep 2014 at 5:38
Original issue reported on code.google.com by
Stelmina...@gmail.com
on 12 Feb 2009 at 5:08