Kotlin / kotlinx.serialization

Kotlin multiplatform / multi-format serialization
Apache License 2.0
5.31k stars 619 forks source link

Serializer to get the raw json value of a key? #1058

Open tmm1 opened 3 years ago

tmm1 commented 3 years ago

What is your use-case and why do you need this feature?

I am looking for a way to deserialize a specific field in my json into a raw byte array representing the value json sub-document.

Basically my json documents have large/complex json sub-trees that I would like to avoid parsing to save cpu/allocations. But I still need the value so I can re-create the original json if needed.

In golang, for example, this can be achieved with json.RawMessage: https://golang.org/pkg/encoding/json/#RawMessage

In gson, a type adapter can be used to regenerate json during parsing. This is not particularly cpu/gc efficient, but it works: https://github.com/google/gson/issues/1368

In moshi, there is work being done to be able to skip over the value and consume it into a raw value field: https://github.com/square/moshi/issues/675

Describe the solution you'd like

I'm not familiar enough with the kotlin.serialization APIs to know if there is already a way to do this, or if it can be implemented within a custom serializer. Any pointers would be appreciated!

elizarov commented 3 years ago

Please, check out Json Elements: https://github.com/Kotlin/kotlinx.serialization/blob/master/docs/json.md#json-elements Does it do what you are looking at?

sandwwraith commented 3 years ago

If I understood correctly, you want to save some part of JSON to a String (RawJson) property so it would be parsed later? We do not currently support this concept. JsonElement is an untyped version that does not do mapping on classes, although it still performs parsing to check that your JSON is valid

tmm1 commented 3 years ago

If I understood correctly, you want to save some part of JSON to a String (RawJson) property so it would be parsed later?

Yes, exactly. I want to defer parsing for parts of the document.

JsonElement is not a good fit because it still parses and creates objects. So the cpu/memory benefits of lazy parsing are lost.

How does kotlin.serialization handle ignores unknown keys when deserializing into an object? Are the keys skipped during or after parsing? (I'm wondering what the cpu/allocation overhead is in cases where keys are ultimately ignored)

sandwwraith commented 3 years ago

The unknown keys are skipped without parsing (tokenizing only). However, the skipped string is not saved anywhere, so it requires some additional amount of work to support such a feature

qwwdfsad commented 3 years ago

The feature seems like a reasonable addition, tho it still has some open questions.

Are the keys skipped during or after parsing?

Could you please elaborate on your use-case here? Because "put all unknown keys in a separate String property with valid JSON string" and "Treat specifically marked property not as simple String, but as a valid JSON encoded in String" are completely different approaches.

JsonElement is not a good fit because it still parses and creates objects. So the cpu/memory benefits of lazy parsing are lost.

I wonder if there exist benchmarks (or maybe you have a relevant story to add?) to ensure that the performance boost is significant here. Because even without allocations of JsonElement, parser still has to 1) parse the JSON and extract the relevant sub-object 2) ensure that the whole sub-object is a valid JSON. And the second part is probably the slowest in the whole JSON decoding process, so I'm really interested in knowing how big is the performance improvement here.

tmm1 commented 3 years ago

"Treat specifically marked property not as simple String, but as a valid JSON encoded in String"

This is what I'm interested in and what is implemented by the other examples I provided.

I have one specific key in my json that contains a huge json subtree, with thousands of objects and several layers of nesting. I don't want to these create thousands of objects per json parse because it leads to severe GC pressure on many Android devices.

qwwdfsad commented 3 years ago

Thanks for the clarification and your input!

It's not something we are going to do right now (at least until 1.1.0 version), but thanks to your feedback, I've left the possibility to add this functionality in a backwards-compatible way both for custom serializers and regular JSON usages. Let's see how it goes in Moshi and the demand on that.

Design idea: instead of using @RawJson annotation, introduce an inline class RawString(value: String) with its own custom serializer to provide a better type-safety and emphasis user intention in a type

ankushg commented 2 years ago

@qwwdfsad If this functionality were something I'd be interested in contributing, do you have any pointers on where to start?

brendan-gero-humanetix commented 2 years ago

@qwwdfsad

I'd like to add that there's a slightly different use case that I have, which is preventing me from switching to kotlinx.serialization. I have a situation where I would like to store the sub-object, as a JSON string, in a database, but I also want to deserialise it to examine its contents. Without RawJson, this means deserialising the whole JSON object, and then reserialising the sub-object for storage. Similarly, when serving the data again (potentially as part of a collection), I'd need to deserialise the sub-object before serialising the full object for output. I might be a bit naive, not having delved into the specifics of how this all works, but to me this seems to be a bit redundant. The string is already there as a substring of the original input, or will become a substring of the output.

I've tried to achieve this through a custom serializer, a bit like this:

object RawJsonSerializer : JsonTransformingSerializer<String>(String.serializer()) {

    override fun transformDeserialize(element: JsonElement): JsonElement {
        if (element !is JsonObject) {
            throw Exception("Expected schedule object")
        }

        return JsonPrimitive(
            polymorphicSerialiser.encodeToString(polymorphicSerialiser.decodeFromJsonElement<BaseClass>(element))
        )
    }

    override fun transformSerialize(element: JsonElement): JsonElement {
        if (element !is JsonPrimitive || !element.isString) {
            throw Exception("Expected schedule string")
        }
        return JsonObject(polymorphicSerialiser.decodeFromString(element.content))
    }
}

but I found that for large collections of data, this ended up slower than using Jackson with the JsonRawValue annotation. Is there a better way to achieve this?

chakflying commented 2 years ago

I'm following this guide to implement fallback for deserializing Enums. However, I would like to also log the raw value when it failed. Is there any way to get this from the decoder?

sandwwraith commented 2 years ago

Unfortunately, we do not support retrieving raw values yet.

Also relevant: #1405

iseki0 commented 1 year ago

Thanks for the clarification and your input!

It's not something we are going to do right now (at least until 1.1.0 version), but thanks to your feedback, I've left the possibility to add this functionality in a backwards-compatible way both for custom serializers and regular JSON usages. Let's see how it goes in Moshi and the demand on that.

Design idea: instead of using @RawJson annotation, introduce an inline class RawString(value: String) with its own custom serializer to provide a better type-safety and emphasis user intention in a type

Currently we have any way to achieve it? The documentation said I must provide a correct descriptor. But in this case I don't know which descriptor is suitable. We use JSON format in a bad way. Deserialize the whole tree is impossible in my case.(It use too many memory.) I must hand-write a deserializer which need access the json token and build structure in my own way. So I need kotlin serialization just "skip" the suitable tokens and leave it to my own code.