com-lihaoyi / upickle

uPickle: a simple, fast, dependency-free JSON & Binary (MessagePack) serialization library for Scala
https://com-lihaoyi.github.io/upickle
MIT License
706 stars 158 forks source link

Way to ignore/omit Java style ser #511

Closed samthebest closed 10 months ago

samthebest commented 10 months ago

If we ever refactor our types, say move their package, or rename them, my understanding is this causes exceptions like this:

upickle.core.Abort: invalid tag for tagged object: my.package.MyType
    at upickle.AttributeTagged$$anon$4.visitValue(Api.scala:244)
    at upack.BaseMsgPackReader.parseMap(MsgPackReader.scala:124)
    at upack.BaseMsgPackReader.parse(MsgPackReader.scala:83)
    at upack.BaseMsgPackReader.parseMap(MsgPackReader.scala:124)
    at upack.BaseMsgPackReader.parse(MsgPackReader.scala:83)
    at upack.Readable$$anon$1.transform(Readable.scala:11)
    at upickle.Api.$anonfun$readBinary$1(Api.scala:30)
    at upickle.core.TraceVisitor$.withTrace(TraceVisitor.scala:18)
    at upickle.Api.readBinary(Api.scala:30)
    at upickle.Api.readBinary$(Api.scala:29)
    at upickle.default$.readBinary(Api.scala:133)

Can we switch off this behaviour? I.e. just make it deser/ser the data and ignore the location and name of the type?

lihaoyi commented 10 months ago

You can do this by passing in an explicit @key annotation i believe

samthebest commented 10 months ago

Thanks @lihaoyi !

This doesn't quite work for us because we would have to keep adding these annotations every time files are moved around. What we really want, is that moving files around in a Scala project, doesn't break serialisation - or require extra steps to ensure things don't break.

Hence we'd ideally like to provide a param to upickle.default.readBinary that says "ignore FQN" and just cares about the structure.

I actually cloned the repo to see if I could just add this feature myself, but Intellij is having difficulty loading the build.sbt - do you have any contributor documentation?

lihaoyi commented 10 months ago

So the FQN is used for distinguishing case classes in a sealed trait. We cannot in general rely on field names since they may be the same, and even the nonqualified name may be duplicated since the different case classes may be within separate packages or objects within the same file.

In specific cases, we may be able to make those assumptions. IIRC Scala 3 enums we use the short name, since those are guaranteed to be in the same namespace and cannot collide. You may be able to do some kind of similar thing if you have other guarantees about your data that you can assume to be true.

You could e.g. make it buffer up the stuff it's reading into a ujson.IndexedValue, inspect all the field names, then instantiate the right type. This is what we do for '$type` tags that do not appear as the first field in a dictionary, and it could work for your requirements as well and you may be able to re-use some of the logic

SBT is just used for the docsite. To build and test uPickle, try ./mill -i resolve __.test, and then ./mill -i <test-target> <fully-qualified-test-name> that you want (the test name is optional)

samthebest commented 10 months ago

@lihaoyi Thanks for the tips. One last question, do you have any documentation detailing the raw protocol? So we could write our own deser logic outside the upickle visitor pattern. If we knew how upickle laid out the bytes then we should easily be able to parse it ignoring the FQN.

We had a look at ujson.IndexedValue and couldn't see how to parse case classes without FQN.

lihaoyi commented 10 months ago

The protocol is basically json converted to messagepack. The data types map more or less one to one. There's some msgpack-json conversion code somewhere in the test suite yoy can use if you want

lihaoyi commented 10 months ago

for the purposes of eliding the type tag, you should not need to touch msgpack at all, since the changes will be on the case class side of things. You can iterate on it with JSON and when done using it for MsgPack should just work

samthebest commented 10 months ago

@lihaoyi So I think I follow, so I tried this: ujson.read(bytes) hoping to get some kind of arbitrary Json structure that I could then iterate through, but I got this:

ujson.ParseException: expected json value got "ヌ" at index 0
    at ujson.ByteParser.die(ByteParser.scala:84)
    at ujson.ByteParser.parseTopLevel0(ByteParser.scala:337)
    at ujson.ByteParser.parseTopLevel(ByteParser.scala:307)
    at ujson.ByteParser.parse(ByteParser.scala:59)
    at ujson.ByteArrayParser$.transform(ByteArrayParser.scala:33)
    at ujson.ByteArrayParser$.transform(ByteArrayParser.scala:32)
    at ujson.Readable$fromTransformer.transform(Readable.scala:13)
    at ujson.package$.transform(package.scala:4)
    at ujson.package$.$anonfun$read$1(package.scala:10)
    at upickle.core.TraceVisitor$.withTrace(TraceVisitor.scala:18)
    at ujson.package$.read(package.scala:10)
...
lihaoyi commented 10 months ago

oh I don't meant to use ujson to messagepack binaries, you should use upickle to parse JSON and pass in JSON that doesn't have $type keys and once you get that working it should work for msgpack

samthebest commented 10 months ago

@lihaoyi That worked!

If we do upickle.default.readBinary[Value](bytes) we get back a JSON structure we can traverse with "$type" keys!

Thanks for your timely responses!