edn-format / edn

Extensible Data Notation
2.62k stars 96 forks source link

Spec allows consecutive colons in symbols and keywords #68

Open favila opened 10 years ago

favila commented 10 years ago

EDN spec does not say that consecutive colons are illegal (e.g. a::b) even though clojure.reader and clojure.edn won't parse these.

(What is the reason for this restriction?)

raszi commented 8 years ago

A sample code to reproduce the issue:

(clojure.edn/read-string (pr-str (into {} [[(keyword "a::b") "a"]])))
java.lang.RuntimeException: Invalid token: :a::b
raszi commented 8 years ago

An even crazier example which is also somehow related:

(clojure.edn/read-string (pr-str (into {} [[(keyword "a b c") "a"]])))
{:a b, c "a"}
xpe commented 3 years ago

In reply to the comment by @raszi ...

For those that are surprised or perhaps confused by the example from @raszi that shows how (clojure.edn/read-string (pr-str (into {} [[(keyword "a b c") "a"]]))) evaluates to {:a b, c "a"}.

Clarifying the keyword function

Let's start with some basics. First, let's review the keyword function. (doc keyword) gives:

clojure.core/keyword
([name] [ns name])
  Returns a Keyword with the given namespace and name.  Do not use :
  in the keyword strings, it will be added automatically.

I want to call attention to the part that says "a" keyword. To emphasize, only one keyword is returned.

Here are some correct and idiomatic uses of keyword:

Here is an obviously incorrect use: (keyword "bar" "foo" "extra") gives

Execution error (ArityException) at user/eval163 (REPL:1).
Wrong number of args (3) passed to: clojure.core/keyword

Using keyword in a perhaps confusing way

Now, let's talk about a valid but perhaps confusing use of keyword. (I also consider the following example to be non-idiomatic or at least uncommon.) Consider (keyword "x y") which prints :x y at the REPL.

Aside: I did not say that (keyword "x y") evaluates to :x y -- that would be impossible, since y is syntactically a symbol and in this context, undefined. You can verify this by trying (resolve 'y).

To return to the previous thread, (keyword "x y") prints :x y at the REPL. To be clear, :x y is the printed representation of one keyword with the name "x y". You can easily verify this:

  1. (type (keyword "x y")) evaluates to clojure.lang.Keyword.
  2. (name (keyword "x y")) evaluates to "x y".

If you haven't seen name before, here is the documentation, available by evaluating (doc name) at the REPL:

clojure.core/name
([x])
  Returns the name String of a string, symbol or keyword.

Similarly, (keyword "x y z") evaluates to one keyword: :x y z. This keyword has the name "x y z".

Revisiting the first example

Let's take this from smaller to larger:

  1. Let's look at (a smaller piece of) the example that @raszi mentioned, [(keyword "a b c") "a"]. Based on what I showed above, we can reason through this and see that this will return a vector with two elements: (i) a keyword with the name ":a b c" and (ii) the string "a". Here is one way to demonstrate this: (clojure.string/join "<$>" [(keyword "a b c") "a"]) evaluates to ":a b c<$>a"

  2. Next, let's look at (into {} [[(keyword "a b c") "a"]]) which prints as {:a b c "a"}. This is a map with one key, :a b c and one value, "a".

  3. What happens if you type {:a b c "a"} at the REPL? You'll get (my markdown markup added): "Syntax error compiling at (REPL:0:0). / Unable to resolve symbol: b in this context". Conclusion: don't type this into the REPL! :)

  4. What should you type at the REPL instead? This: {(keyword "a b c") "a"}. You might say, "That's much less ambiguous" and you would be right! :)

  5. Next, let's look at (pr-str (into {} [[(keyword "a b c") "a"]])) which evaluates to "{:a b c \"a\"}". Based on number 3, above, if you pass this to clojure.edn/read-string, you should not expect it to evaluate to the input value!

Conclusion

Using (clojure.edn/read-string (pr-str thing)) is not guaranteed to recover the thing!

To state it in a different way, #(clojure.edn/read-string (pr-str %1)) is not the identity function. It appears to work that way in a vast majority of common cases, but it breaks down in some 'weird' ways. Caveat evaluator!

The Fun Doesn't Have to Stop here...

Read much more about this issue (and more) on this Clojure Google Group thread: How to escape a space in a keyword?.