dannote / mod-ndb

Automatically exported from code.google.com/p/mod-ndb
0 stars 0 forks source link

Strings are not recoded between character sets. #7

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Mod_ndb is not aware of the character set of any string column.
It is also not aware of the character set requirement for the output.
It does not even properly escape quotes and backslashes when encoding a string 
for JSON.  (i.e. the 
JSON output will be invalid if a string contains a doublequote or a backslash).

Original issue reported on code.google.com by john.david.duncan on 26 Feb 2007 at 4:11

GoogleCodeExporter commented 9 years ago

Original comment by john.david.duncan on 26 Feb 2007 at 5:02

GoogleCodeExporter commented 9 years ago
From http://www.ietf.org/rfc/rfc4627.txt --

2.5 Strings
All Unicode characters may be placed within the
   quotation marks except for the characters that must be escaped:
   quotation mark, reverse solidus, and the control characters (U+0000
   through U+001F).

3. Encoding

JSON text SHALL be encoded in Unicode.  The default encoding is UTF-8.   
[*ONLY* UTF-8 will be supported in 
mod_ndb].

6. IANA Considerations
   The MIME media type for JSON text is application/json.

Original comment by john.david.duncan on 26 Feb 2007 at 9:01

GoogleCodeExporter commented 9 years ago

Original comment by john.david.duncan on 26 Feb 2007 at 9:35

GoogleCodeExporter commented 9 years ago
In XML, entity encodings -- & , " , etc., should be used rather than 
backslash escapes.

Original comment by john.david.duncan on 27 Feb 2007 at 2:34

GoogleCodeExporter commented 9 years ago
In r233, XML output is supported, and quotes, backslashes, etc. are correctly 
escaped in both JSON and XML 
output, for UTF-8 and all single-byte character sets. But I will leave this 
issue open (with lowered priority) until 
the character set issues are completely understood.

Original comment by john.david.duncan on 28 Feb 2007 at 4:19

GoogleCodeExporter commented 9 years ago
POST data should be recoded from the submitted (content-type) character set 
into the table character set.

JSON response data should be recoded from the table character set to UTF-8.

XML/HTML response data may not need to be recoded, but it should mention the 
correct character set in the 
content-type field.  (It is possible to do this already with a DefaultType 
directive).  

Original comment by john.david.duncan on 25 Sep 2007 at 6:15