cuizhennan / snakeyaml

Automatically exported from code.google.com/p/snakeyaml
Apache License 2.0
1 stars 0 forks source link

ScannerException when loading stream with tab character #136

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.  new Yaml().load("--- 36L\tDIESEL\n");
2.
3.

What is the expected output? What do you see instead?
"36L\tDIESEL"

What version of SnakeYAML are you using? On what Java version?
1.9

Please provide any additional information below. (Often a failing test is
the best way to describe the problem.)

A verbatim tab character is allowed in a YAML string so the example given above 
should parse OK.

Original issue reported on code.google.com by don...@gmail.com on 13 Dec 2011 at 11:18

GoogleCodeExporter commented 9 years ago
would be interesting to see the Exception without compiling anything. Since you 
have it already why not to post it here?

Original comment by alexande...@gmail.com on 13 Dec 2011 at 11:25

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Thank you for you reply.

I try the following example:

import org.yaml.snakeyaml.Yaml;

public class Issue136 {
    public static void main(String[] args) {
        new Yaml().load("--- 36L\tDIESEL\n");
    }
}

and I get the following exception:

Exception in thread "main" while scanning for the next token
found character     '\t' that cannot start any token
 in "<string>", line 1, column 8:
    --- 36L DIESEL
           ^

    at org.yaml.snakeyaml.scanner.ScannerImpl.fetchMoreTokens(ScannerImpl.java:358)
    at org.yaml.snakeyaml.scanner.ScannerImpl.peekToken(ScannerImpl.java:202)
    at org.yaml.snakeyaml.parser.ParserImpl$ParseDocumentEnd.produce(ParserImpl.java:265)
    at org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:161)
    at org.yaml.snakeyaml.parser.ParserImpl.getEvent(ParserImpl.java:171)
    at org.yaml.snakeyaml.composer.Composer.composeDocument(Composer.java:125)
    at org.yaml.snakeyaml.composer.Composer.getSingleNode(Composer.java:106)
    at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:121)
    at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:296)
    at org.yaml.snakeyaml.Yaml.load(Yaml.java:266)
    at Issue136.main(Issue136.java:5)

Original comment by don...@gmail.com on 14 Dec 2011 at 8:46

GoogleCodeExporter commented 9 years ago
A failing test is added:
http://code.google.com/p/snakeyaml/source/browse/src/test/java/org/yaml/snakeyam
l/issues/issue136/TabInScalarTest.java

Original comment by py4fun@gmail.com on 14 Dec 2011 at 8:46

GoogleCodeExporter commented 9 years ago
The corresponding issue for PyYAML has been created:
http://pyyaml.org/ticket/219

Original comment by py4fun@gmail.com on 14 Dec 2011 at 9:23

GoogleCodeExporter commented 9 years ago
I found the answer. This comment explains it 
(http://code.google.com/p/snakeyaml/source/browse/src/main/java/org/yaml/snakeya
ml/scanner/ScannerImpl.java#1630):
"The specification is really confusing about tabs in plain scalars.
 We just forbid them completely. Do not use tabs in YAML!"

I would recommend to ask the question in the general YAML mailing list (see 
http://www.yaml.org/) to get the proper explanation.

Original comment by py4fun@gmail.com on 14 Dec 2011 at 6:31

GoogleCodeExporter commented 9 years ago
I will ask on the mailing list.

The example is output from Ruby 1.8, so it must be pretty common.

Original comment by don...@gmail.com on 15 Dec 2011 at 10:54

GoogleCodeExporter commented 9 years ago
I think I managed to fix the problem. Please have a look at this clone:
http://code.google.com/r/py4fun-tabinscalar/source/checkout

As you can see, the test accepts tabs inside a plain scalar:
http://code.google.com/r/py4fun-tabinscalar/source/browse/src/test/java/org/yaml
/snakeyaml/issues/issue136/TabInScalarTest.java

If we accept this fix then SnakeYAML will work differently then PyYAML. I will 
try to create the same patch for PyYAML and discuss it in the  YAML core 
mailing list.

Original comment by py4fun@gmail.com on 18 Dec 2011 at 9:43

GoogleCodeExporter commented 9 years ago
I posted the question on the mailing list, and got an answer that a tab in a 
plain scalar should be allowed:

http://sourceforge.net/mailarchive/forum.php?thread_name=419E75E7-E159-4998-995D
-9EEC8D075F94%40datek.no&forum_name=yaml-core

I will try the clone, and report back to JRuby which uses SnakeYAML in their 
Ruby 1.9 implementation.

Original comment by don...@gmail.com on 18 Dec 2011 at 10:27

GoogleCodeExporter commented 9 years ago
Alas, I have no hg client available to me.  Can you make a JAR available?

Original comment by don...@gmail.com on 18 Dec 2011 at 10:29

GoogleCodeExporter commented 9 years ago
Here it is: http://code.google.com/p/snakeyaml/downloads/list

I follow your question in the YAML core mailing list, but I do not see any 
answer. Did you get the answer to your private account ?

As far as I know Ruby 1.8 is using its own YAML parser it caused a number of 
problems. In Ruby 1.9 then switched to Psych, which is using the same core 
engine as PyYAML and SnakeYAML. Ruby 1.9 and JRuby shall work the same. But 
with this fix SnakeYAML (and JRuby when it stitches to this version) will 
accept the tabs but Ruby will not. It will cause misunderstanding. That is why 
it is important to have the common approach with PyYAML and libyaml (used by 
Psych)

Original comment by py4fun@gmail.com on 19 Dec 2011 at 8:55

GoogleCodeExporter commented 9 years ago
Thanks for the JAR.

Yes, the reply was sent to me privately.  I have forwarded it to the mailing 
list now.

I agree that interoperability is very important.  How should we bring this 
change to Ruby?

Original comment by don...@gmail.com on 19 Dec 2011 at 9:04

GoogleCodeExporter commented 9 years ago
I have provided the solution with tests to PyYAML (ticket 219 - 
http://pyyaml.org/ticket/219).
In order to use it in Ruby 1.9 the following must be done:
1) we agree on this approach with PyYAML developers
2) PyYAML and libyaml are fixed
3) Psych must take the latest version of libyaml (with the fix)
4) Ruby must take the latest Psych version

It looks like a long path...

Original comment by py4fun@gmail.com on 19 Dec 2011 at 9:59

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Please be aware that you can use tabs in double- or single- quoted scalars. 
This is easy and safe because it works the same way in all the parsers. I have 
added a test to show it:
http://code.google.com/p/snakeyaml/source/browse/src/test/java/org/yaml/snakeyam
l/issues/issue136/TabInScalarTest.java

(The file snakeyaml-1.10-SNAPSHOT.jar will be removed from the 'Download' area 
to avoid confusion)

Original comment by py4fun@gmail.com on 21 Dec 2011 at 8:54

GoogleCodeExporter commented 9 years ago
My problem is that I do not control the encoding of the stream, only the 
decoding, so the tabs are already present.

py4fun, if we get the verification that the YAML is legal, can you release a 
1.10 version with this fix?  The JRuby team would like to use a release version 
rather that a snapshot.

Original comment by don...@gmail.com on 26 Dec 2011 at 7:24

GoogleCodeExporter commented 9 years ago
1) does it mean that the 1.10-SNAPSHOT works as you expect ? (can I remove it ?)
2) we do not mind to include this change (fix ?) into 1.10, but I would like 
first to hear the explanation from Kirill (PyYAML). 
Please be aware that the very same YAML document will work differently in Ruby 
and JRuby.
According to our release cycle, 1.10 version will be released in February.

Original comment by py4fun@gmail.com on 27 Dec 2011 at 9:34

GoogleCodeExporter commented 9 years ago
Yes, the snapshot works as expected.

Excellent that you can include the fix.  We all away Kirill's verdict :)

I am not sure when the next JRuby release is, but we will want to have the new 
release version included.  If JRuby 1.7.0 is released before february, I guess 
we will include a snapshot first, and then include the release version of 
SnakeYAML in a later patch-level release.

Original comment by don...@gmail.com on 27 Dec 2011 at 1:04

GoogleCodeExporter commented 9 years ago
Fixed. Try the latest snapshot.

The fix will be delivered in version 1.10

Original comment by py4fun@gmail.com on 12 Jan 2012 at 7:32

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
This issue still shows up in version 1.11.

Any ideas why this is still happening?

Original comment by yuri.pan...@gmail.com on 6 Nov 2012 at 3:19

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Can you please provide more information ? What is happening ? How we can 
reproduce it ? Can you run all the tests ?
If you mean this:
found character         '\t' that cannot start any token
 in 'reader', line 3, column 1:
        CREATE TABLE `account` (
then it is a totally different issue. Please read:  http://yaml.org/spec/1.1/

Tabs may appear inside
 comments and quoted or
 block scalar content.
 Tabs must not appear
 elsewhere, such as
 in indentation and
 separation spaces.

This is exactly what the error message says. Tabs cannot be used as indentation.

Original comment by py4fun@gmail.com on 6 Nov 2012 at 8:58

GoogleCodeExporter commented 9 years ago
Thanks for the response. Your link points to the yaml specification. Of course, 
I will not be reading the bazillion pages to try to understand why tabs are not 
allowed as indentation. Logically, I do not see a plausible reason why it 
should not be allowed especially since many people use gui editors to do their 
work. But thank you nonetheless for the answer.

Original comment by yuri.pan...@gmail.com on 6 Nov 2012 at 5:19

GoogleCodeExporter commented 9 years ago
To avoid misunderstanding in the future the error message has been improved
See: http://code.google.com/p/snakeyaml/wiki/changes

Implemented here: 
http://code.google.com/p/snakeyaml/source/detail?r=af8d7ccf66e5fa047be44c6899ff6
6979b94251d

It will be delivered in version 1.12

Original comment by py4fun@gmail.com on 8 Nov 2012 at 3:06