cuizhennan / snakeyaml

Automatically exported from code.google.com/p/snakeyaml
Apache License 2.0
1 stars 0 forks source link

Some nonprintable/unacceptable characters can appear in output, unescaped #148

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
There are a number of Unicode codepoints, when present in a String, will cause 
snakeyaml to emit unescaped non-printable or unacceptable characters.

For example, a String containing \u007F will be serialized to a String 
containing that same non-printable  and unacceptable character, left unescaped. 
Whereas the spec says "On output, a YAML processor must only produce these 
acceptable characters, and should also escape all non-printable Unicode 
characters."

There are numerous other Unicode codepoints that cause the same issue. I have 
attached a test program that lists them.

The desired behavior is that either the character be escaped, an exception be 
thrown rejecting the input, or the documentation changed to state that some 
Strings can cause snakeyaml to silently produce invalid output.

snakeyaml version 1.10, Apple Java 1.6.0_31 on OS X 10.6.8.

Original issue reported on code.google.com by johnk...@gmail.com on 5 Jun 2012 at 1:21

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by py4fun@gmail.com on 6 Jun 2012 at 10:54

GoogleCodeExporter commented 9 years ago
I think, I have found the answer. There is no bug in SnakeYAML.
This only works for double-quoted scalars.
http://yaml.org/spec/1.1/#id872840
"Note that escape sequences are only interpreted in double-quoted scalars."

You need to instruct SnakeYAML:
        DumperOptions options = new DumperOptions();
        options.setAllowUnicode(false);
        options.setDefaultScalarStyle(ScalarStyle.DOUBLE_QUOTED);
        return new Yaml(options);

See the test:
http://code.google.com/p/snakeyaml/source/browse/src/test/java/org/yaml/snakeyam
l/issues/issue148/PrintableUnicodeTest.java

(this test is a copy-and-paste of TestSnakeYamlCharacterEncoding.java)

Please let us know if it solves the problem.

Original comment by py4fun@gmail.com on 8 Jun 2012 at 5:20

GoogleCodeExporter commented 9 years ago
Right, you can't have nonprintable characters in single-quoted YAML strings.

Maybe I'm being pedantic here, but regardless of the default scalar dump style 
that is set, shouldn't strings that contain unprintable characters always be 
dumped double-quoted, since thats the only way snakeyaml can follow the spec? 
To me 'setDefaultScalarStyle' makes it sound like its setting whats used when 
snakeyaml has a choice in what form it can use, but in this case there is no 
choice if snakeyaml is following the spec.

To me the safest default behavior is to always follow the spec. If a user wants 
to set some option thats documented to produce noncompliant yaml, then thats 
their own business. :)

Original comment by johnk...@gmail.com on 8 Jun 2012 at 6:26

GoogleCodeExporter commented 9 years ago
1) I am not in favor to force SnakeYAML to make any intelligent guess. I think 
the better solution is to make a proposal for the developing YAML 2 
specification. 
2) Humans from Greece, Russia, Japan, China and many others may have a 
different vision on what is 'printable'. They may (and will!) prefer to see the 
text in their own language instead of a dummy sequence 
'\u0134\u0156\u0167\0174'. (DumperOptions.setAllowUnicode(boolean))

Since there is no work for SnakeYAML, I will close the issue. If you think 
something can be improved, please create another issue with the clear proposal.

Original comment by py4fun@gmail.com on 11 Jun 2012 at 12:07

GoogleCodeExporter commented 9 years ago
See http://code.google.com/p/snakeyaml/issues/detail?id=148 for a bug report 
that shows SnakeYAML is wrong here; YAML spec clearly states that a compliant 
YAML processor should emit unprintable characters escaped, where SnakeYAML is 
emitting them directly.

Original comment by head...@headius.com on 26 Sep 2012 at 7:17

GoogleCodeExporter commented 9 years ago
Sorry, that's http://code.google.com/p/snakeyaml/issues/detail?id=158

Original comment by head...@headius.com on 26 Sep 2012 at 7:19