Closed GoogleCodeExporter closed 9 years ago
This is according to the spec: http://yaml.org/spec/1.1/#id868518
The character you want to use is not printable.
Original comment by py4fun@gmail.com
on 4 Jun 2012 at 9:26
I think the "#xE000-#xFFFD" character range given in the YAML spec as being
printable is intended to be inclusive of the upper bound.
While the YAML spec doesn't seem to specify fully which variant of BNF they are
using to describe the syntax, in RFC 4234 ABNF, value range alternatives are
inclusive.
Original comment by johnk...@gmail.com
on 4 Jun 2012 at 4:40
I think you are right.
Original comment by aso...@gmail.com
on 4 Jun 2012 at 6:21
I forgot why \uFFFD has been excluded. This is because Java returns this code
in case of an error. There is no way to distinguish an error from this
character. I shall put this info into the documentation to make it clear.
See the source here:
http://code.google.com/p/snakeyaml/source/browse/src/main/java/org/yaml/snakeyam
l/reader/StreamReader.java#31
// NON_PRINTABLE changed from PyYAML: \uFFFD excluded because Java returns
// it in case of data corruption
Original comment by py4fun@gmail.com
on 5 Jun 2012 at 10:15
I can't find any specific reference to U+FFFD in the Java documentation. But
from what I understand, the general idea, not at all specific to Java, is that
it gets inserted into Unicode text wherever a process is unable to convert a
character between encoding correctly. It is however a valid, printable unicode
codepoint, and there's nothing malformed about strings that contain it, and the
YAML spec reflects this.
IMO, libraries generally shouldn't take special action on this character,
because applications which accept arbitrary unicode input need to be able to
work with this character, and the proper handling of it is
application-specific. (The most common behavior I've seen in editors and web
browsers is to have no special handling whatsoever, meaning they display the
character's glyph from the font, same as any other printable character.)
If you're unwilling to consider changing the behavior of the deserializer,
would you consider changing the behavior of the serializer to escape this
character? The deserializer handles this character correctly when it is
escaped. Then at least I'd know that round-tripping would work consistently,
without having to preprocess all the strings I feed to snakeyaml.
Original comment by johnk...@gmail.com
on 5 Jun 2012 at 5:08
[deleted comment]
It is not about willing/unwilling. It is a Java-specific problem. Feel free to
propose a solution. If you can find a way to implement your requirement when
_ALL_ the tests stay green, your solution will be taken for the next release.
The problem is similar to the UTF-8 BOM mark. Java IO is broken and it does not
ignore the UTF-8 BOM mark at the beginning of the stream. This is the only
reason why SnakeYAML has
http://code.google.com/p/snakeyaml/source/browse/src/main/java/org/yaml/snakeyam
l/reader/UnicodeReader.java
Original comment by py4fun@gmail.com
on 5 Jun 2012 at 9:03
The two unit tests that fail for me after making U+FFFD printable are:
org.yaml.snakeyaml.issues.issue68.NonAsciiCharacterTest.testLoadFromFileWithWron
gEncoding
org.pyyaml.PyReaderTest.testReaderUnicodeErrors
org.yaml.snakeyaml.issues.issue68.NonAsciiCharacterTest.testLoadFromFileWithWron
gEncoding() isn't actually configuring the Reader to report encoding errors.
I'd change how it sets up the InputStreamReader:
CharsetDecoder decoder = Charset.forName("Cp1252").newDecoder();
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
Object text = yaml.load(new InputStreamReader(input, decoder));
Then when issue68.txt is passed through the Reader, it throws
java.nio.charset.UnmappableCharacterException. Of course, then the test isn't
really testing snakeyaml, its testing the behavior of java.io.Reader. So this
unit test doesn't need to exist at all; issue 68 could have been resolved by
informing the user that snakeyaml was working as designed, and if they want
their Reader to throw exceptions on encoding errors, they can configure it to
do so.
To fix org.pyyaml.PyReaderTest.testReaderUnicodeErrors, UnicodeReader.init()
needs to be changed:
// Use given encoding
CharsetDecoder decoder =
encoding.newDecoder().onUnmappableCharacter(CodingErrorAction.REPORT);
internalIn2 = new InputStreamReader(internalIn, decoder);
Then org.pyyaml.PyReaderTest.testReaderUnicodeErrors there needs to be an
additional catch block to get the new type of exception that will get thrown:
} catch (YAMLException e) {
assertTrue(e.toString(),
e.toString().contains("MalformedInputException"));
} finally {
Original comment by johnk...@gmail.com
on 5 Jun 2012 at 11:41
Fixed. It will be delivered in version 1.11
Thank you.
http://code.google.com/p/snakeyaml/source/browse/src/test/java/org/yaml/snakeyam
l/issues/issue147/PrintableTest.java
Original comment by py4fun@gmail.com
on 6 Jun 2012 at 12:49
Any schedule for getting a 1.11 release out? There's at least one reported
JRuby bug related to 0xFFFD being rejected.
It's not a critical thing for us, but we're pushing a new JRuby 1.7 preview
release next week, and it would be nice to get SnakeYAML 1.11 in it.
FWIW, the reported issue was reported by me, because of the divergence from
YAML specification. Nobody has reported a real-world JRuby issue due to 0xFFFD
rejection.
Original comment by head...@headius.com
on 26 Jul 2012 at 5:01
The JRuby issue in question:
http://jira.codehaus.org/browse/JRUBY-6317
Original comment by head...@headius.com
on 26 Jul 2012 at 5:01
Dear JRuby developers,
SnakeYAML has implemented a few fixes/features exclusively for JRuby.
Unfortunately, the feedback from JRuby developers gets the lowerst priority.
We have a couple of places where we expect some info from JRuby:
http://jira.codehaus.org/browse/JRUBY-6067
http://code.google.com/p/snakeyaml/issues/detail?id=146
Once we get the feedback, we can close the corresponding issues and release
SnakeYAML.
(version 1.11 will be released in August 2012)
Original comment by aso...@gmail.com
on 27 Jul 2012 at 9:42
We apologize for not being more responsive; I think these updates were getting
funneled into my mail archive, and it has been a very busy summer.
I have commented on the bugs in question, including issue 132 that was
connected to JRUBY-6067.
Original comment by head...@headius.com
on 28 Sep 2012 at 9:35
Original issue reported on code.google.com by
johnk...@gmail.com
on 1 Jun 2012 at 9:51