dscarpetti / codax

An idiomatic transactional embedded database for clojure
Eclipse Public License 1.0
179 stars 9 forks source link

Add support for maps as keys (pathwise) #34

Open Frozenlock opened 4 months ago

Frozenlock commented 4 months ago

Clojurians are used to be able to use pretty much anything as keys, including maps.

Codax is almost there, but lacks support for maps.

I came up with a simple encoder/decoder that leverages Nippy:

(ns my-ns
  (:require [codax.core :as c]
            [taoensso.nippy :as nippy])
  (:import (java.util Base64)))

;; Simply encoding bytes to strings (String. bytes) causes the data to
;; be corrupted after a round trip in Codax.  Encoding in b64 solves
;; this.

(defn bytes-to-base64 [bytes]
  (let [encoder (Base64/getEncoder)]
    (.encodeToString encoder bytes)))

(defn base64-to-bytes [base64-string]
  (let [decoder (Base64/getDecoder)]
    (.decode decoder base64-string)))

(c/defpathtype [0x71
                clojure.lang.PersistentHashMap
                clojure.lang.PersistentArrayMap
                clojure.lang.PersistentTreeMap]
  (fn map-encoder [m]
    (bytes-to-base64 (nippy/freeze m)))

  (fn map-decoder [s]
    (nippy/thaw (base64-to-bytes s))))

Nippy also automatically handles keys ordering:

(let [encoding1 (:encoding (c/check-path-encoding {{:a 1, :b 2} "map1"}))
      encoding2 (:encoding (c/check-path-encoding {{:b 2, :a 1} "map1"}))]
  (= encoding1 encoding2))

;=> true

Could something like this be added to Codax? Could it leverage other types defined for pathwise?

Frozenlock commented 4 months ago

Looks like there's already some base64 utilities in nippy:

(c/defpathtype [0x71
                clojure.lang.PersistentHashMap
                clojure.lang.PersistentArrayMap
                clojure.lang.PersistentTreeMap]
  (fn map-encoder [m]
    (nippy/freeze-to-string m))

  (fn map-decoder [s]
    (nippy/thaw-from-string s)))
dscarpetti commented 4 months ago

It's a good idea. Since nippy/freeze-to-string base64 encodes the value it should be safe. It won't play nicely with the seeking functions (unpredictable ordering), but it shouldn't break anything and there really isn't an obvious total ordering between maps that anyone would be expecting anyway.

We should probably add support for clojure.lang.PersistentHashSet & clojure.lang.PersistentTreeSet under that hex-code as well, since the nippy encoder manages it's own type tagging. (You should be able to supply the nippy functions directly to defpathtype without wrapping them in an additional fn)

If you want to make a PR I'd be happy to merge it.

Frozenlock commented 4 months ago

You should be able to supply the nippy functions directly to defpathtype without wrapping them in an additional fn

Ah, yes! Vestiges of the previous version :wink:


Maps as keys

Turns out it's more complex than what I initially thought. Because those maps act as keys:

  1. They must always serialize to the same value, otherwise the paths are different and the values are 'lost'.
  2. All the various map types must serialize to the same value, to keep in sync with Clojure's behavior.
    (= (sorted-map :a 1) {:a 1}) 
    ;=> true
  3. Metadata must be stripped as the additional data would result in a different string.

I think I was able to solve all those issues with a surprisingly small amount of code.

There is one downside however... it requires an additional library.

In light of all of this, do you still think it's a good idea to try to incorporate this into Codax? Also, do you have a recommendation as to which hex code should be used? (How did you choose the other ones?)

dscarpetti commented 4 months ago

I appreciate you being so thorough. After giving it some thought on my end, I think by refactoring the pathwise treatment of vectors, it is possible to allow for map and set path keys pretty easily. The set-map-keys branch has a working example, but I am not sure I've fully considered all the implications. Please have a look when you can.

As for the hex code, I don't think I had any solid methodology for selecting hex codes. Really should have reserved some of them for extensions like this since it could, in theory, interfere with other existing custom types.

Frozenlock commented 4 months ago

I was wondering how you could sort heterogeneous data, but it looks like you're encoding it before the sort, which means you're always sorting strings. :+1:

The only failing tests I have with your version is related to clojure.lang.PersistentTreeSet and clojure.lang.PersistentTreeMap (sorted-map and sorted-set) as they don't have a path encoder. But thinking more about it, I really dislike the silent conversion to their unsorted counterpart. It should probably be left to the user to manually do the conversion, if needed.

Really should have reserved some of them for extensions like this since it could, in theory, interfere with other existing custom types.

Should you take the opportunity to reserve a few hex codes? We can't do anything for users of previous versions, but at least users of version 1.4.X and upward could avoid using reserved codes.