RussellSpitzer / snakeyaml

Automatically exported from code.google.com/p/snakeyaml
Apache License 2.0
0 stars 1 forks source link

SnakeYAML parses 08 and 09 as floats while they should be strings #207

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
Goto http://instantyaml.appspot.com/ and enter&submit the following yaml:
a: 07
b: 077
c: 0
d: 0A
e: !!str 09
f: 09.
g: 09

What is the expected output? What do you see instead?
On instantyaml 07, 077 and 0 will be parsed as int-s, 09. as float, everything 
else as strings.

In SnakeYAML 14 however g: 09 will be parsed as float as well.

What version of SnakeYAML are you using? On what Java version?
SnakeYAML 14.0

java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)

Please provide any additional information below. (Often a failing test is
the best way to describe the problem.)

I tried this in Java:

<code language="java">
Yaml yaml = new Yaml();
assertEquals(7, yaml.load("07"));
assertEquals(63, yaml.load("077"));
assertEquals(0, yaml.load("0"));
assertEquals("0A", yaml.load("0A"));
assertEquals("09", yaml.load("!!str 09"));

// these should fail but they pass (same happens with 8 and "08"):
assertEquals(9d, yaml.load("09"));
assertEquals(ImmutableMap.of("a", 9d), yaml.load("a: 09"));

// these should pass but they fail (same happens with 8 and "08"):
assertEquals(ImmutableMap.of("a", "09"), yaml.load("a: 09"));
assertEquals("09", yaml.load("09"));
</code>

It seems the problem is with the float regexp used in 
[https://code.google.com/p/snakeyaml/source/browse/src/main/java/org/yaml/snakey
aml/resolver/Resolver.java#39 Resolver] class.
It allows expressions without a '.' to be parsed into float while 
[http://yaml.org/type/float.html Yaml 1.1 float type] seems to always require a 
'.'

The problem only happens if there is an 8 or 9 in the token, otherwise a token 
starting with a 0 and only containing [0-7] digits is parsed into an int (as an 
octal).

Original issue reported on code.google.com by kiralyat...@gmail.com on 15 Feb 2015 at 1:05

GoogleCodeExporter commented 9 years ago
Indeed, the '.' is not required to fix issue 130:
https://code.google.com/p/snakeyaml/issues/detail?id=130

8e-06 is a proper float without a decimal point.

Apparently it is a mismatch between the 1.1 and 1.2 specifications.

Feel free to make a proposal how we can fix it.

Original comment by py4fun@gmail.com on 16 Feb 2015 at 2:03

GoogleCodeExporter commented 9 years ago
I think the problem is that Yaml 1.2 spec is confusing.

Yaml 1.1 http://yaml.org/type/float.html
It needs a '.', regexp: [-+]?([0-9][0-9_]*)?\.[0-9.]*([eE][-+][0-9]+)?

Yaml 1.2 http://www.yaml.org/spec/1.2/spec.html
It defines 3 recommended schemas
10.1. Failsafe: doesn't parse floats

10.2. JSON Schema: "A YAML processor should therefore support this schema, at 
least as an option. It is also strongly recommended that other schemas should 
be based on it."
For non-zero float number tags it defines the regexp: -? [1-9] ( \. [0-9]* 
[1-9] )? ( e [-+] [1-9] [0-9]* )?
Which doesn't allow 08 and 09 to be parsed as float.
And in tag resolution it defines the regexp: -? ( 0 | [1-9] [0-9]* ) ( \. 
[0-9]* )? ( [eE] [-+]? [0-9]+ )?
For floats which also doesn't parse 08 and 09.

10.3. Core Schema: "The Core schema is an extension of the JSON schema, 
allowing for more human-readable presentation of the same types. This is the 
recommended default schema that YAML processor should use unless instructed 
otherwise. It is also strongly recommended that other schemas should be based 
on it."
"The core schema uses the same tags as the JSON schema."
Which means for me that for non-zero float number tags the regexp is the same 
as for json: -? [1-9] ( \. [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?
Which doesn't parse 08 and 09.

But in 10.3.2 Tag resolution it defines regexp [-+]? ( \. [0-9]+ | [0-9]+ ( \. 
[0-9]* )? ) ( [eE] [-+]? [0-9]+ )?
Which DOES allow parsing 08 and 09 into floats.

I have checked a few other parsers:
http://nodeca.github.io/js-yaml/
http://yaml-online-parser.appspot.com/
http://instantyaml.appspot.com/
None of them parses 09 as float.

However if I try "!!float 09" they are not so consistent either:
js-yaml gives error for "!!float 09" (not a float) but parses "!!float 09."
The other 2 (and SnakeYaml) parses both forms as doubles.

Imho this is a bug in the specification that from 4 regexp used for float 1 
allows numbers without '.' to be parsed as float.

From the point of SnakeYAML I think this boils down to the question: which Yaml 
specification is supported by SnakeYAML. Based on the SnakeYAML website and 
yaml.org, for me it looks to support only 1.1 and not 1.2. In case of 1.1 it 
shouldn't parse 08/09 into float. In case of 1.2 I think it should parse it 
during tag resolution but not when expicitly marked as !!float.

Original comment by kiralyat...@gmail.com on 16 Feb 2015 at 11:38

GoogleCodeExporter commented 9 years ago
Your test was added to show the problem:
https://code.google.com/p/snakeyaml/source/browse/src/test/java/org/yaml/snakeya
ml/issues/issue207/OctalNumberTest.java

Apparently, there is no simple solution. 
I think it would be better to keep the current situation (parse float as 
defined in the YAML 1.2 specification).
Those who understand the issue can always apply any pattern they want. Resolver 
is configurable.

Original comment by py4fun@gmail.com on 17 Feb 2015 at 4:22