Closed grzm closed 2 years ago
With "good" 108 version:
❯ clj -Sdeps '{:deps { clj-commons/clj-yaml {:mvn/version "0.7.108"}}}'
Downloading: clj-commons/clj-yaml/0.7.108/clj-yaml-0.7.108.pom from clojars
Downloading: clj-commons/clj-yaml/0.7.108/clj-yaml-0.7.108.jar from clojars
Clojure 1.11.1
user=> (require '[clj-yaml.core :as yaml])
nil
user=> (doseq [x ["083" {:x "083"}]] (print (yaml/generate-string x)))
'083'
{x: '083'}
nil
user=>
With "good" 108 but bumping the yaml lib to 1.32 (version used in 110)
clj -Sdeps '{:deps {org.yaml/snakeyaml {:mvn/version "1.32"} clj-commons/clj-yaml {:mvn/version "0.7.108"}}}'
Downloading: org/yaml/snakeyaml/1.32/snakeyaml-1.32.pom from central
Downloading: org/yaml/snakeyaml/1.32/snakeyaml-1.32.jar from central
Clojure 1.11.1
user=> (require '[clj-yaml.core :as yaml])
nil
user=> (doseq [x ["083" {:x "083"}]] (print (yaml/generate-string x)))
083
{x: 083}
nil
Looks like the regression was introduced in snakeyaml 1.30. Using [org.yaml/snakeyaml "1.29"]
shows the expected, quoted result.
@grzm I tested this with EDN input like this:
{:foo [123 "045" "083"]}
Output:
$ clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.110"} org.yaml/snakeyaml {:mvn/version "1.31"}}}}' -M -e "(require '[clj-yaml.core :as yaml]) (print (yaml/generate-string (clojure.edn/read-string (slurp \"/tmp/test.edn\"))))"
foo: [123, '045', 083]
That doesn't make sense: strings should stay strings.
It looks like we're have to going to fix it in the clj-yaml library instead of snakeyaml:
https://bitbucket.org/snakeyaml/snakeyaml/issues/550/regression-in-handling-number-like-strings
You can provide your own Resolver to SnakeYAML (and drop the octal numbers and timestamps) https://bitbucket.org/snakeyaml/snakeyaml/src/master/src/main/java/org/yaml/snakeyaml/resolver/Resolver.java
I don't fully get it yet, so I'm trying to understand better by hitting snakeyaml directly.
Given snake_load.clj
(require '[clojure.pprint :as pprint])
(import '(org.yaml.snakeyaml Yaml))
(pprint/print-table
(for [yaml ["'083'" "083" "043" "83"]]
{:from (pr-str yaml) :to (pr-str (.load (Yaml.) yaml))}))
And snake_dump.clj
(require '[clojure.pprint :as pprint])
(import '(org.yaml.snakeyaml Yaml))
(pprint/print-table
(for [in ["83" "043" "083" 83]]
{:from (pr-str in) :to (pr-str (.dump (Yaml.) in))}))
And a little snaketest.clj
bb driver (we were recently on 1.26, our issue seems to have occurred between 1.29 and 1.30 and we were recently on 1.31 so...)
(require '[babashka.tasks :as t])
(doseq [script ["snake_load.clj" "snake_dump.clj"]
snake-version ["1.26" "1.29" "1.30" "1.31" "1.32"]]
(println "snakeyaml version" snake-version script "--------")
(t/clojure "-Sdeps"
(format "{:deps {org.yaml/snakeyaml {:mvn/version \"%s\"}}}" snake-version)
"-M" script)
(println))
When I run bb snaketest.clj
I get (edited down and annotated to what is interesting).
Loading from YAML:
snakeyaml version 1.26 and 1.29 snake_load.clj --------
| :from | :to |
|---------+-------|
| "'083'" | "083" |
| "083" | 83.0 | <-- this seems wrong
| "043" | 35 |
| "83" | 83 |
snakeyaml version 1.30, 1.31 and 1.32 snake_load.clj --------
| :from | :to |
|---------+-------|
| "'083'" | "083" |
| "083" | "083" | <<-- difference
| "043" | 35 |
| "83" | 83 |
So: 083
is not a valid YAML octal number so it is assumed to be a string.
043
is valid YAML octal, so it gets loaded as a numeric.
(Octal 43 is 35 decimal).
Dumping to YAML:
snakeyaml version 1.26 and 1.29 snake_dump.clj --------
| :from | :to |
|-------+-----------|
| "83" | "'83'\n" |
| "043" | "'043'\n" |
| "083" | "'083'\n" |
| 83 | "83\n" |
snakeyaml version 1.30, 1.31 and 1.32 snake_dump.clj --------
| :from | :to |
|-------+-----------|
| "83" | "'83'\n" |
| "043" | "'043'\n" |
| "083" | "083\n" | <-- difference
| 83 | "83\n" |
So: I don't get this one yet. The string "083"
is being converted to what looks like an invalid YAML number? Does anybody understand the rationale for this new snakeyaml behaviour yet? After reading this snakeyaml issue, all I'm guessing is since string "083"
looks like a number (starts with 0), but can't be number because it is not valid octal, it is safe to be dumped like as a YAML without quotes, because it won't be interpreted as YAML number when it is loaded? Brain twisty!
So maybe in YAML 1.1 a string that looks like a number but can't be a number can forego the quotes (but why? I've not noticed any shortage in the availability of quote characters).
Is this a bug in the YAML 1.1 spec (they did rework octal numbers for YAML 1.2) or a bug in snakeyaml or a bug in my noggin?
Anyhoo.... if my understanding above is all solid, or good enough, I think we can adapt to spitting out strings that look like numbers but are not numbers to include the YAML quotes.
@asomov are you suggesting there is a similar issue with timestamps?
you asked a few questions, but I think you know the answers
Thank you for your reply @asomov, very much appreciated!
And also thank you so much for all the hard work on the snakeyaml libraries (sorry, I didn't make the connection until just now). I can't imagine that it is easy work!
@borkdude I'll also take a peek at your roundtrip test from above, understand what is going on, and make sure we'll be good there.
So @borkdude, on your example:
{:foo [123 "045" "083"]}
Output:
$ clj -Sdeps '{:deps {clj-commons/clj-yaml {:mvn/version "0.7.110"} org.yaml/snakeyaml {:mvn/version "1.31"}}}}' -M -e "(require '[clj-yaml.core :as yaml]) (print (yaml/generate-string (clojure.edn/read-string (slurp \"/tmp/test.edn\"))))" foo: [123, '045', 083]
That doesn't make sense: strings should stay strings.
The non-intuitive thing (to me) that I learned is that 083
is actually a string in YAML because, although it looks like a number, it cannot be a number.
Under snakeyaml 1.3.1
(def x {:foo [123 "045" "083"]})
(def o (generate-string x))
o
;; => "foo: [123, '045', 083]\n"
(parse-string o)
;; => #ordered/map ([:foo (123 "045" "083")])
We can choose to explicitly exclude octal support with a custom snakeyaml Resolver.
I'd rather just emit '083'
instead of 083
, but not sure if we can ask snakeyaml to do that yet.
But... we can also do nothing. It could be argued that we do not have a bug here.
Is it safe to propose the following based on @grzm's latest comment at SnakeYAML's repo:
clj-yaml will preserve quotes around strings that look like numbers. This is not against the yaml 1.1 spec and will undo breaking changes experienced by reading output of this library by yaml 1.2 readers.
We could coordinate with @asomov to get this change back into SnakeYAML but I think SnakeYAML doesn't have very frequent releases. Until that happens, we could perhaps implement the proposal in clj-yaml, until a new SnakeYAML comes out that has the change and then undo it here.
Yes thanks, I think we've finally gotten to the crux of this issue! 🎉
This is not against the yaml 1.1 spec
To be clear, I believe this to be true, but haven't read the spec closely enough to, say, provide expert testimony in a court of law.
So should we recap our current options?
Let @grzm and others suffer! 😈
After it is approved maybe use a fork until it gets released?
Requires some snakeyaml spelunking and might not be viable.
Not sure about the effect of this yet, but can tell snakeyaml not to resolve octals. Maybe current octals would be parsed as strings? Would this affect emitting? Would need to try.
Option 2a sounds promising, but this depends on what @asomov thinks about this.
Option 3: @asomov hinted at this here I believe. Option 3b would be: provide an option to drop octal parsing (enabled by default or disabled by default).
Oh right forgot:
This avoids the broken YAML 1.1 octals by moving to YAML 1.2. But also could be a bit of work, and might introduce other issues. Have not looked into it at all. Might be worth an exploration.
It was merged! Thanks you @lread and @asomov!
Now, we could either wait for 1.33 or depend on a fork until 1.33 comes out. Considering that this a breaking change to other Clojure users, I'd like to have it sooner than later.
there is another issue to fix. Once finished, we can release SnakeYAML 1.33
Thanks @asomov! (and thanks for the variable name change, more clear for sure!).
If I can help with that final issue, just let me know.
For clarity: what is that final issue?
@asomov Don't mean to push you, so just take this as a question:
Is there a likelihood that there will be a new SnakeYAML release within the next few days?
I'd like to release a new version of https://babashka.org/ (scripting environment for Clojure) in which clj-yaml is a built-in dependency, so it'd be nice to know if I should wait, or just go ahead without the new SnakeYAML and bump clj-yaml in a future release.
Thanks for the ongoing work!
I would like to finish this
Thanks for the heads up! I'll watch the issue
@lread SnakeYAML 1.33 has been released. It may take a few hours to promote the JARs to Maven central
@asomov Thanks so much for your work!
@asomov a big sincere thank you for SnakeYAML and for this new release!
Thank you, @asomov !
It looks like there's a regression since 0.7.109 and still exists in 0.7.110 with respect to number-like strings.
Note that
'083'
in {x: '083'} is quoted in 0.7.108 but not quoted in 0.7.109 and 0.7.110.I'm guessing it's an issue in the upstream snakeyaml library, but I haven't checked that yet.