cognitect / transit-clj

transit-format implementation for Clojure
Apache License 2.0
364 stars 23 forks source link

'^' crashes reader with StringIndexOutOfBoundsException #45

Closed SimonDanisch closed 5 years ago

SimonDanisch commented 5 years ago

MWE: https://nextjournal.com/a/LEjZdbN3nL5irFe2khgCM?token=KBtQJAPvS6TVvcmXyeN5U5

(require '[cognitect.transit :as transit])
(import [java.io ByteArrayInputStream ByteArrayOutputStream])
(def out "\"^\"")
(def in (ByteArrayInputStream. (.getBytes out)))
(def reader (transit/reader in :json))
(transit/read reader)

Stacktrace:

{:via [{:type java.lang.RuntimeException, :message "java.lang.StringIndexOutOfBoundsException: String index out of range: 1", :at [com.cognitect.transit.impl.ReaderFactory$ReaderImpl read "ReaderFactory.java" 114]} {:type java.lang.StringIndexOutOfBoundsException, :message "String index out of range: 1", :at [java.lang.String charAt "String.java" 658]}], :trace [[java.lang.String charAt "String.java" 658] [com.cognitect.transit.impl.ReadCache codeToIndex "ReadCache.java" 30] [com.cognitect.transit.impl.ReadCache cacheRead "ReadCache.java" 40] [com.cognitect.transit.impl.JsonParser parseVal "JsonParser.java" 60] [com.cognitect.transit.impl.JsonParser parse "JsonParser.java" 46] [com.cognitect.transit.impl.ReaderFactory$ReaderImpl read "ReaderFactory.java" 112] [cognitect.transit$read invokeStatic "transit.clj" 319] [cognitect.transit$read invoke "transit.clj" 315] [user$eval871 invokeStatic "NO_SOURCE_PATH" 5] [user$eval871 invoke "NO_SOURCE_PATH" 5] [clojure.lang.Compiler eval "Compiler.java" 7176] [clojure.lang.Compiler eval "Compiler.java" 7166] [clojure.lang.Compiler eval "Compiler.java" 7131] [clojure.core$eval invokeStatic "core.clj" 3214] [clojure.core$eval invoke "core.clj" 3210] [user$eval869 invokeStatic "NO_SOURCE_PATH" 5] [user$eval869 invoke "NO_SOURCE_PATH" 5] [clojure.lang.Compiler eval "Compiler.java" 7176] [clojure.lang.Compiler eval "Compiler.java" 7131] [clojure.core$eval invokeStatic "core.clj" 3214] [clojure.core$eval invoke "core.clj" 3210] [unrepl.repl$hpNiwYgtt8PN_xegDbIo_Axg5Xo$start$interruptible_eval__637$fn__638$fn__639$fn__640 invoke "NO_SOURCE_FILE" 697] [unrepl.repl$hpNiwYgtt8PN_xegDbIo_Axg5Xo$start$interruptible_eval__637$fn__638$fn__639 invoke "NO_SOURCE_FILE" 697] [clojure.lang.AFn applyToHelper "AFn.java" 152] [clojure.lang.AFn applyTo "AFn.java" 144] [clojure.core$apply invokeStatic "core.clj" 665] [clojure.core$with_bindings_STAR_ invokeStatic "core.clj" 1973] [clojure.core$with_bindings_STAR_ doInvoke "core.clj" 1973] [clojure.lang.RestFn invoke "RestFn.java" 425] [unrepl.repl$hpNiwYgtt8PN_xegDbIo_Axg5Xo$start$interruptible_eval__637$fn__638 invoke "NO_SOURCE_FILE" 689] [clojure.core$binding_conveyor_fn$fn__5739 invoke "core.clj" 2030] [clojure.lang.AFn call "AFn.java" 18] [java.util.concurrent.FutureTask run "FutureTask.java" 266] [java.util.concurrent.ThreadPoolExecutor runWorker "ThreadPoolExecutor.java" 1149] [java.util.concurrent.ThreadPoolExecutor$Worker run "ThreadPoolExecutor.java" 624] [java.lang.Thread run "Thread.java" 748]], :cause "String index out of range: 1"}
dchelimsky commented 5 years ago

The json transit reader expects a json transit document. Since "\"^\"" is not valid json, it is not valid input to the reader.

SimonDanisch commented 5 years ago

Actually, this is the reduced case from a valid JSON transit file - I thought it made sense to remove it to the absolute minimal example that crashes with the same error ;) A valid json string with the same error is:

{"a":"^"}

Or as an escaped string: "{\"a\":\"^\"}"

dchelimsky commented 5 years ago

That makes more sense, however, per https://github.com/cognitect/transit-format#special-characters, ^ is reserved for transit encoding.

SimonDanisch commented 5 years ago

I see, thank you! Is it possible to escape it? Or actually, shouldn't it already be escaped inside a string? I would have thought that caching only applies to whole values or to keys.

dchelimsky commented 5 years ago

Yes, the transit writer escapes it with '~', and the reader can then read the escaped string:

$ clj -Sdeps '{:deps {com.cognitect/transit-clj {:mvn/version "0.8.313"}}}'
Clojure 1.10.0
(require '[cognitect.transit :as transit])
(import [java.io ByteArrayInputStream ByteArrayOutputStream])

(def out (ByteArrayOutputStream. 4096))
nil
user=> java.io.ByteArrayOutputStream
user=> user=> user=> #'user/out
user=> (def writer (transit/writer out :json))
#'user/writer
user=> (transit/write writer "^")
nil
user=> (.toString out)
"[\"~#'\",\"~^\"]"

(def in (ByteArrayInputStream. (.toByteArray out)))
(def reader (transit/reader in :json))
#'user/in
user=> #'user/reader
user=> (prn (transit/read reader))
"^"
nil

That all make sense?

SimonDanisch commented 5 years ago

I see! Just to make sure I fix the right thing: I just need to escape [~, ^, `] whenever they occur in the first position of a string. So if I write transit, I'm fine as long as I pre-process my strings like that!

Thank you for your patience and answering this so quickly!

dchelimsky commented 5 years ago

You've got it right, however you only have to worry about that If you're hand-crafting the string. If you use the transit writer to create it, then you don't have to worry about this. Is there something preventing you from using the transit writer?

SimonDanisch commented 5 years ago

Yes, sadly! This is for the communication between a Julia and a Clojure process, and we thought we're fine by just using a modified JSON writer in Julia, since the Julia Transit.jl package is in a bad state and not maintained. Actually, I just checked, and from what I can tell, they also don't do any string escaping, so would have still run into the same bug :D

dchelimsky commented 5 years ago

@SimonDanisch I've nudged the transit.jl maintainers, but it's the first time in a while that it's gotten any attention so it might be a bit before it gets addressed. Keep your eyes on your PR (and that repo in general).