arrdem / shelving

A toolkit for building data stores.
Eclipse Public License 1.0
38 stars 2 forks source link

Investigate using Nippy for real back-ends #14

Open arrdem opened 6 years ago

arrdem commented 6 years ago

https://github.com/ptaoussanis/nippy seems like an easy off-the-shelf faster serialization story, even for the trivial dev stores.

greglook commented 6 years ago

I would caution against Nippy given my past experience with it. Our managed dependencies literally have these two lines in them:

;; WARNING: do not update beyond 2.11.x (newer versions break all stored data)                                                                                                                              
[com.taoensso/nippy "2.11.1"]   

You might be interested in looking at CBOR instead. 😉

martinklepsch commented 6 years ago

@greglook sorry for the aside but would you mind elaborating on those issues? Did you file an issue with Nippy I could look at? Thanks!

The CBOR link doesn’t work by the way, maybe it’s private?

greglook commented 6 years ago

Whoops, I typo'd the link - should be https://github.com/greglook/clj-cbor

The issues we have with Nippy aren't really bugs in Nippy so much as problems with some of its fundamental design choices:

=> (require '[taoensso.nippy :as nippy])
=> (require '[clj-time.core :as time])

=> (time/now)
#<org.joda.time.DateTime@569ccb35 2018-11-02T18:01:40.278Z>

=> (nippy/freeze (time/now))
#whidbey/bin "TlBZAAYAAAAWb3JnLmpvZGEudGltZS5EYXRlVGltZaztAAVzcgAWb3JnLmpvZGEudGltZS5EYXRlVGltZbg8eGRqW935AgAAeHIAH29yZy5qb2RhLnRpbWUuYmFzZS5CYXNlRGF0ZVRpbWX///nhT10uowIAAkoAB2lNaWxsaXNMAAtpQ2hyb25vbG9neXQAGkxvcmcvam9kYS90aW1lL0Nocm9ub2xvZ3k7eHAAAAFm1ZaLznNyACdvcmcuam9kYS50aW1lLmNocm9uby5JU09DaHJvbm9sb2d5JFN0dWKpyBFmcTdQJwMAAHhwc3IAH29yZy5qb2RhLnRpbWUuRGF0ZVRpbWVab25lJFN0dWKmLwGafDIa4wMAAHhwdwUAA1VUQ3h4"

=> (io/copy *1 (io/file "now.nippy"))
$ hexdump -C now.nippy
00000000  4e 50 59 00 06 00 00 00  16 6f 72 67 2e 6a 6f 64  |NPY......org.jod|
00000010  61 2e 74 69 6d 65 2e 44  61 74 65 54 69 6d 65 ac  |a.time.DateTime.|
00000020  ed 00 05 73 72 00 16 6f  72 67 2e 6a 6f 64 61 2e  |...sr..org.joda.|
00000030  74 69 6d 65 2e 44 61 74  65 54 69 6d 65 b8 3c 78  |time.DateTime.<x|
00000040  64 6a 5b dd f9 02 00 00  78 72 00 1f 6f 72 67 2e  |dj[.....xr..org.|
00000050  6a 6f 64 61 2e 74 69 6d  65 2e 62 61 73 65 2e 42  |joda.time.base.B|
00000060  61 73 65 44 61 74 65 54  69 6d 65 ff ff f9 e1 4f  |aseDateTime....O|
00000070  5d 2e a3 02 00 02 4a 00  07 69 4d 69 6c 6c 69 73  |].....J..iMillis|
00000080  4c 00 0b 69 43 68 72 6f  6e 6f 6c 6f 67 79 74 00  |L..iChronologyt.|
00000090  1a 4c 6f 72 67 2f 6a 6f  64 61 2f 74 69 6d 65 2f  |.Lorg/joda/time/|
000000a0  43 68 72 6f 6e 6f 6c 6f  67 79 3b 78 70 00 00 01  |Chronology;xp...|
000000b0  66 d5 96 8b ce 73 72 00  27 6f 72 67 2e 6a 6f 64  |f....sr.'org.jod|
000000c0  61 2e 74 69 6d 65 2e 63  68 72 6f 6e 6f 2e 49 53  |a.time.chrono.IS|
000000d0  4f 43 68 72 6f 6e 6f 6c  6f 67 79 24 53 74 75 62  |OChronology$Stub|
000000e0  a9 c8 11 66 71 37 50 27  03 00 00 78 70 73 72 00  |...fq7P'...xpsr.|
000000f0  1f 6f 72 67 2e 6a 6f 64  61 2e 74 69 6d 65 2e 44  |.org.joda.time.D|
00000100  61 74 65 54 69 6d 65 5a  6f 6e 65 24 53 74 75 62  |ateTimeZone$Stub|
00000110  a6 2f 01 9a 7c 32 1a e3  03 00 00 78 70 77 05 00  |./..|2.....xpw..|
00000120  03 55 54 43 78 78                                 |.UTCxx|

Nippy's advantages are that it's very fast and it "just works" ... until things blow up later. I think it'd be okay as a transit codec, but I would not use it for a storage codec again.

martinklepsch commented 6 years ago

Interesting, thanks!

EDIT Couldn't help myself but generate docs for clj-cbor on cljdoc. cljdoc currently uses nippy for storage serialization btw. I wasn't really aware of the problems you described but I also think they don't affect cljdoc at this stage. It's great to know though, so thanks again.

greglook commented 6 years ago

Couldn't help myself but generate docs for clj-cbor on cljdoc.

Neat, thanks! I don't mean to bash on Nippy too hard - it is extremely efficient at representing the core set of types, and for many use-cases it's just fine. However, some of the idiosyncrasies above led to some unpleasant surprises after we'd been using it in production for a while.