dakrone / cheshire

Clojure JSON and JSON SMILE (binary json format) encoding/decoding
https://github.com/dakrone/cheshire
MIT License
1.49k stars 151 forks source link

Seeking to preserve order of keys in map after decode #73

Closed dyba closed 9 years ago

dyba commented 9 years ago

I am retrieving a JSON template from a remote server which I will use to populate and then send back to create a record on that server. When I decode the template in the body of the response, the resulting map reorders the keys in the original template. Unfortunately, the remote server will not accept my request to create a record since I have submitted a JSON template with reordered keys.

I found a library that allows me to keep the ordered nature of the JSON template using the ordered-map function. With that find, I tried using the decode method to preserve the ordering of the keys in the JSON template by checking the root level field name and converting the associated value to an ordered map:

;; here (new-contribution-receipt) returns a response from the remote server containing
;; the JSON template in the body key
(decode
  (:body (new-receipt))
  false
  (fn [field-name]
    (if (= field-name "contributionReceipt")
      ordered-map
      {})))
;; => keys are not ordered

That didn't work, so I'm trying to think of the problem from a different angle. Perhaps I need to modify Cheshire's decode function so that I can pass an option that would let me apply a function (i.e. ordered-map) to values that are maps?

Does the community have any ideas on how I can tackle this problem?

jayp commented 9 years ago

I don't understand the problem.

When I decode the template in the body of the response, the resulting map reorders the keys in the original template.

clojure.repl=> (decode "{\"foo\": \"bar\", \"foo2\": \"bar\"}")
{"foo" "bar", "foo2" "bar"}
clojure.repl=> (decode "{\"foo2\": \"bar\", \"foo\": \"bar\"}")
{"foo2" "bar", "foo" "bar"}

Ordering seems to be maintained. Can you give an example of re-ordering?

ztellman commented 9 years ago

That's because smaller maps use PersistentArrayMap, which is ordered. With over 8 entries, it'll spill over into PersistentHashMap, which is not. A custom map implementation can be used, but it will involve memory overhead.

jayp commented 9 years ago

Ahh, I see. Thanks for the explanation @ztellman.

clojure.repl=> (decode (str "{" (clojure.string/join ", " (map #(str "\"foo" % "\": \"bar\"") (range 8))) "}"))
{"foo0" "bar", "foo1" "bar", "foo2" "bar", "foo3" "bar", "foo4" "bar", "foo5" "bar", "foo6" "bar", "foo7" "bar"}
clojure.repl=> (decode (str "{" (clojure.string/join ", " (map #(str "\"foo" % "\": \"bar\"") (range 9))) "}"))
{"foo1" "bar", "foo4" "bar", "foo7" "bar", "foo8" "bar", "foo2" "bar", "foo6" "bar", "foo0" "bar", "foo3" "bar", "foo5" "bar"}
dyba commented 9 years ago

@ztellman :+1: Nice explanation! I knew Clojure reordered the keys as part of an optimization but now I understand conceptually what happens.

jayp commented 9 years ago

@dyba --

As a relatively new Clojure user, I was feeling the need to learn a bit more this. Here is what I found:

It seems you are referring to ordered-map from [flatland.ordered.map "1.5.2"] Firstly, just to very explicit, array-coerce-fn needs to return an actual collection. Unless ordered-map in your code is referring to a var, it will need to used a function call, as below:

(fn [field-name]
  (if (= field-name "contributionReceipt")
    (ordered-map)
    {}))

Secondly, it seems Cheshire only support coercing function for vectors. Here is a patch that makes it work for maps too.

diff --git a/src/cheshire/parse.clj b/src/cheshire/parse.clj
index 312b119..3e528ca 100644
--- a/src/cheshire/parse.clj
+++ b/src/cheshire/parse.clj
@@ -14,9 +14,11 @@

 (definline parse-object [^JsonParser jp key-fn bd? array-coerce-fn]
   (let [jp (tag jp)]
-    `(do
+    `(let [object-field-name# (.getCurrentName ~jp)]
        (.nextToken ~jp)
-       (loop [mmap# (transient {})]
+       (loop [mmap# (transient (if ~array-coerce-fn
+                                   (~array-coerce-fn object-field-name#)
+                                   {}))]
          (if-not (identical? (.getCurrentToken ~jp)
                              JsonToken/END_OBJECT)
            (let [key-str# (.getText ~jp)

I am not submitting as a pull request because:

dakrone commented 9 years ago

@jayp Hmm.. I'm on the fence about this. The JSON website/spec specifically states that "An object is an unordered set of name/value pairs" (emphasis mine) and that arrays should be used when order is important.

Is there any chance the server that is expecting an ordered map could be fixed? That's probably the most correct way to handle this, though I understand that servers are not always under your control.

dyba commented 9 years ago

@dakrone I referenced the same line you quoted from the JSON spec to the third-party vendor who maintains the server. I've yet to hear back from them if they are able to change it.

jayp commented 9 years ago

HI @dakrone - I appreciate software that allow for a more liberal interpretation of standards. For instance, web browsers attempt to follow the standard, but are forgiving. With that said, it's your call. No sleep lost if this issue were closed without any affect.

dyba commented 9 years ago

@jayp @dakrone I just got a response from the third-party. They are indeed using an XML generator under the hood to generate their JSON, which explains why they require any JSON templates in a request sent to them to be ordered.

I think the workaround would be to use their API and request the data as XML rather than JSON. Thank you both for taking the time to read my issue and provide your feedback. :+1:

px0 commented 4 years ago

FWIW I just ran into this as well. I really think this should be an option. Sometimes APIs are just wrong but we still have to deal with it.

px0 commented 4 years ago

If anyone else needs this, here is a fork that includes the map-coerce-fn as described by @jayp : https://github.com/px0/cheshire

ivos commented 4 years ago

Another use case: I want to test a REST API I am building. In the tests I want to verify the whole JSON response body by matching it as a String to the content of a JSON file with expected response body. In order to prevent whitespace-related issues when comparing the Strings I need to re-format both the expected and actual JSONs using the same formatter. In case of mismatch I also want to be able to replace the template file content with the actual response, obviously without re-ordering the fields of large maps/objects.

Doing this in plain Java using Jackson works OK (using TreeNode as the intermediary data structure and .readTree and .writeTree methods).

Would be great to be able to use Cheshire for this on a Clojure project.

Also, I'd like to point out that although the JSON standard might not guarantee the order of map keys, all JSON processing libraries I know of do keep it, including Javascript's own JSON.parse, and Java's Jackson which is used by Cheshire.

The different order of the keys is NOT because of JSON as the resolution might imply. The reason is the way Clojure manages map datastructures, as already pointed out by @ztellman above, and that has nothing to do with JSON itself.

I would therefore dare to propose not only to open this feature to extensibility (although even this would be great), but to consider using ordered-map by default. This would make Cheshire consistent with all other JSON libraries out there.

I would prefer consistency over trying to minimize memory overhead, which is not quantified anyway, I know these are just my personal priorities, but extensibility would allow for more efficient datastructure (like Clojure's {}), should the user so desire.

JulesGosnell commented 4 years ago

I need to load json docs and preserve order of keys aswell - seems to me that if you provide an API for overriding implementation of "array" (defaulting to vector) you should do the same for impl of "object" (defaulting to hash-map).

I might want to load directly into a sorted-map, to save sorting later... I might want to load into custom record types.... I might want to preserve key order in spite of json spec... I might want to...

Jules

JulesGosnell commented 1 year ago

I was reviewing our project's having to have it's own copy of cheshire/parse.clj just to hack it to allow overriding with a custom map type for all json-objects.

I thought I had given pretty good reasons above as to why this would be a useful and sensible addition to the API but I see that two releases of Cheshire have been done this year and neither appears to include this feature :-( Perhaps we needed to shout louder to get it in ?

I would really appreciate it if this feature could be considered for the next release.

many thanks

Jules

(definline parse-object [^JsonParser jp key-fn bd? array-coerce-fn]
  (let [jp (tag jp)]
    `(do
       (.nextToken ~jp)
       (loop [mmap# (transient (MY-CUSTOM-MAP))]
         (if-not (identical? (.getCurrentToken ~jp)
                             JsonToken/END_OBJECT)
           (let [key-str# (.getText ~jp)
                 _# (.nextToken ~jp)
                 key# (~key-fn key-str#)
                 mmap# (assoc! mmap# key#
                               (parse* ~jp ~key-fn ~bd? ~array-coerce-fn))]
             (.nextToken ~jp)
             (recur mmap#))
           (persistent! mmap#))))))
mtrimpe commented 1 year ago

Another use-case for which I needed ordered keys is consistently serializing JSON to allow for hash-based comparisons on the generated string.

Since consumers must be able to do these comparisons as well they need to work on the serialized string and can't be based on the internal Clojure data structure.

If there is a JSON schema for the JSON then this can be achieved always serialize properties in the order defined in the associated JSON schema.

That only works if the JSON schema can first be read the provided order and then have the output serialized against that order.

Iddodo commented 8 months ago

How are you handling the absence of this feature nowadays? Is there a solution?

I needed to edit some JSON for a code review, and the lack of order in keys has made the diff comparison very much unreadable, so that might be another use-case (albeit admittedly quite the trivial one).

borkdude commented 8 months ago

A (hacky?) alternative solution is to use clj-yaml which supports ordered maps:

bb -e '(-> (clj-yaml.core/parse-string "{\"a\": 1, \"b\": 2, \"c\": 2, \"d\": 2, \"e\": 2, \"f\": 2, \"g\": 2, \"i\": 2}")
                  (assoc :a 2) (cheshire.core/generate-string))'
"{\"a\":2,\"b\":2,\"c\":2,\"d\":2,\"e\":2,\"f\":2,\"g\":2,\"i\":2}"

if you replace clj-yaml.core/parse-string with cheshire.core/parse-string the order will change, but when reading it as yaml, the order is preserved. FWIW.