Allow custom policy for adding the polymorphic discriminator

joffrey-bion commented 3 years ago

What is your use-case and why do you need this feature?

There has been several use cases mentioned in different issues that require to add the discriminator in more places, or avoid it in some places:

for serialization-only use cases (such as sending data to an external API), we may not need discriminators at all. They bloat the payload at best, and can actually fail the requests at worst (https://github.com/Kotlin/kotlinx.serialization/issues/464)
for interop with other deserialization libraries, possibly in more dynamic languages, we may want to have a discriminator for every type (https://github.com/Kotlin/kotlinx.serialization/issues/1128)
even when using only Kotlinx Serialization itself for both serialization and deserialization, there are times when the deserializing party has less information than the serializing party about the type in the JSON, and we may want to consistently include the discriminator for subtypes of polymorphic types that are serializable (https://github.com/Kotlin/kotlinx.serialization/issues/1194). An example of that is sending events over a "pipe" (websocket or other). We know the concrete type at serialization time, so the subtype's serializer can be used, but the deserializer at the other end expects any type of event, so it uses the parent type's deserializer. Currently the discriminator is never added by the child type's serializer, so the parent deserializer fails in this case. It is not always possible to control how kotlinx.serialization is called at the place we send the events, so using the parent type's serializer when serializing the value is not always an option (for instance, some frameworks like Spring look for the serializer to use based on the dynamic value's type, and thus never use the polymorphic parent's serializer).

Describe the solution you'd like

It seems these use cases could be supported by having a configuration parameter for the inclusion of the discriminator, such as discriminatorInclusionPolicy, with the following values:

NEVER: never include the discriminator
POLYMORPHIC_TYPES (current behaviour): include the discriminator only when serializing a type that's polymorphic itself (the "parent" type).
POLYMORPHIC_SUBTYPES: always include the discriminator for instances (subtypes) of polymorphic types that are serializable, regardless of whether they are used as their parent or as themselves (json.encodeToString<Parent>(SubType()) and json.encodeToString<SubType>(SubType()) would behave the same way).
ALWAYS, ALL_TYPES, or ALL_OBJECT_TYPES: always include the discriminator in types serialized as JSON objects (does not apply to primitive types)

Maybe the names could be refined, but I think you get the idea.

Another option could also be to provide a custom function that tells whether to include the discriminator or not, but then we would have to think about which pieces of information the function would need etc.

Any thoughts on this?

Whathecode commented 3 years ago

Thank you for this excellent summary and aggregating a set of diverse yet related issues!

I just wanted to note that the current two ways polymorphism is supported and implemented (normal and array polymorphism) might complicate this suggestion; I haven't given it much thought, though, and I expect it is certainly something the smart people behind this library could tackle. ;p

For example, a recent issue I ran into is that adding a class discriminator to JSON primitives currently only works when array polymorphism is used. I had an Enum which extended from an interface and wanted to serialize this polymorphically, for which I had to create a custom serializer.

This is yet another reason why I still prefer kotlinx.serialization's original proposal to use array polymorphism by default, but I understand this deviates from 'the standard', if that even exists.

P1NG2WIN commented 3 years ago

Any progress on it?

sandwwraith commented 3 years ago

As for now, there is sufficient demand and use-cases to provide an option to globally disable type discriminator (NEVER option) — mainly, to interact with external API that has validation or just to reduce the amount of data. However, to correctly design and implement this feature we need to decide whether to provide other options besides NEVER.

POLYMORPHIC_SUBTYPES, as in the proposal, is a questionable strategy because it makes rules harder to understand — besides static property type, we also need to check if the class is registered in the current module, etc, etc. Instead, we suggest providing a diagnostic when the user probably wants to use base polymorphic type instead of subtype: #1493

ALWAYS also seems vague and rather exotic use-case: you don't really want an external API to depend on your implementation details, such as class serial names. Good implementation with backward compatibility of such a pattern requires a manual providing a @SerialName for each serializable class of your program. Moreover, the current implementation of polymorphism does not allow to specify serial names and discriminators for primitives and enums; implementing ALWAYS strategy would be an inappropriate cheat-code to overcome this limitation — because, well, not all classes would have discriminators, despite ALWAYS state.

So for now we are leaning towards a design with only two states: DEFAULT (current behavior) and NEVER (do not include discriminator at all). If you have any additional suggestions or corrections, please add them

joffrey-bion commented 3 years ago

@sandwwraith Thanks for the summary of the state of this issue.

POLYMORPHIC_SUBTYPES, as in the proposal, is a questionable strategy because it makes rules harder to understand

I actually think the opposite is true sometimes. The current way of serializing surprises some people because the same class will not be serialized the same way depending on whether the serializer used was that of the parent or that of the child class.

The proposal here is to ensure both serializers yield the same result for a given instance. So IMO this rule (which ensures consistent serialization of a given instance) is actually easier to understand than the current default behaviour.

Instead, we suggest providing a diagnostic when the user probably wants to use base polymorphic type instead of subtype: #1493

I understand that for most use cases, it's just a failure of users to understand how the serializer is chosen (based on the static type of the variable at the call site). For these cases https://github.com/Kotlin/kotlinx.serialization/issues/1493 would be sufficient.

However, there are some use cases where the call site is not controlled, and for those it's not about fixing the call site, it's just that some frameworks (e.g. Spring) have a generic API surface on top of different serialization libraries, and they don't have any static type information at the moment they call the concrete serialization library, just the runtime type of the instances to serialize.

For example, anywhere in the code, you could have a call like this:

simpMessagingTemplate.convertAndSend("/some/destination", someObject)

This public API is generic for Spring, regardless of the serialization library used. Once deep down in Spring implementation, the runtime type of someObject is introspected and Kotlinx Serialization is called then, without static info.

A good suggestion by @pdvrieze to work around this issue on Spring side without changing their public API is to wrap the serialized object into an internal wrapper class known to Spring with additional static type information (e.g. the serializer), and unwrap this in the Kotlinx Serialization adapter.

besides static property type, we also need to check if the class is registered in the current module, etc, etc

I would have expected something at compiler-plugin level, because parent classes annotated @Serializable are known at compile time. So I thought we could maybe directly generate a serializer with discriminator in the child class itself. Is there dynamic registration of polymorphic types that could prevent this from being implemented this way?

sandwwraith commented 3 years ago

The point about Spring and other frameworks is a really good one, thanks. Even Ktor has the same problem. It's still possible to use convertAndSend<BaseClass>("/some/destination", someObject), but this is a very easy mistake to make, even with inspection (and I'm not sure if the inspection can tell at all whether the usage of subclass was intended or accidental).

parent classes annotated @Serializable are known at compile time

This is true indeed. However, there is still a 'dynamic registration of polymorphic types' — a SerializersModule, because we want to make sure that only classes that are intended to be polymorphic can be deserialized. (applicable to abstract classes, sealed classes work automatically) There's a bit of motivation about that in the KEEP.

we could maybe directly generate a serializer with a discriminator in the child class itself.

This is an implementation detail: the discriminator is added by the format, not by the serializer. So we need to keep a 'default' serializer for child and do not use PolymorphicSerializer for it because this will lead to a stack overflow

joffrey-bion commented 3 years ago

It's still possible to use convertAndSend<BaseClass>("/some/destination", someObject)

I'm not sure what you mean here. At least as far as this Spring function is concerned, this is not correct. This method doesn't have a type parameter. Spring entirely relies on the runtime type of the 2nd argument, once this "message payload" hits the actual message converter that will be used for serialization. This call site is long forgotten by then.

The only way such a call could be made this way is if we defined an extension function with a reified type parameter, and invent a wrapper class that wraps the payload instance + the serializer to use. And then modify Spring's code in their Kotlinx Serialization adapter to detect this wrapper class, unwrap it, and read the serializer from there. That being said, I still consider it only as a workaround, because of all the unnecessary allocations of wrapper objects. If these messages are handled by the million, it could be a non-negligible overhead.

this is a very easy mistake to make, even with inspection (and I'm not sure if the inspection can tell at all whether the usage of subclass was intended or accidental).

True, even building a reified version of such functions via the workaround mentioned above would actually not entirely solve the problem. Deserializing polymorphic instances on the other end of the communication should (IMO) not really depend on whether the program that did the serialization was using a polymorphic static type or a concrete subtype. I understand why this happens this way at the moment, but I feel like it would be nice to allow to customize this behaviour.

applicable to abstract classes, sealed classes work automatically

I see. I have only ever needed sealed classes for my data model, never abstract classes, that's why I never needed to register anything I guess. I'm mostly interested in making the sealed classes use case work out of the box, since this is the use case I see the most often in real life.

This is an implementation detail: the discriminator is added by the format, not by the serializer.

Isn't the PolymorphicSerializer deciding to add a discriminator independently of the format? My (admittedly limited) understanding of the situation is the following: when I have a sealed class Parent and a child class Child, the compiler plugin generates a regular serializer for the Child class, and a PolymorphicSerializer for the Parent which decides to include a discriminator + delegates to the child. The format then decides how to include the discriminator (e.g. JSON property "type" + fully qualified class name as value).

Based on this very limited and maybe incorrect understanding, the point I was making above was that the Parent could instead get a serializer that "just" delegates to the child's, and the child's serializer could be the one deciding the add a discriminator.

pdvrieze commented 3 years ago

This is an implementation detail: the discriminator is added by the format, not by the serializer. So we need to keep a 'default' serializer for child and do not use PolymorphicSerializer for it because this will lead to a stack overflow

Just to clarify for the others, polymorphic (de)serialization, the (outer) polymorphic serializer that has 2 elements (the type discriminator, and the data element). The data element is then (de)serialized using the type specific serializer. The format (json when not in array format, xml in transparent mode) can elide this extra structure and use a different approach instead (type discriminator, type specific tag name).

Basically it works as follows:

A polymorphic serializer tells the format that polymorphic serialization is used, and it provides the base type, type discriminator, child serializer and child data.
The format decides how to handle this information in a format specific way

As such, without having a polymorphic serializer a format does not know how a type is used, or even what the base type is (there could be multiple valid parent types for the same polymorphic leaf type).

For sealed serialization, there is only one serializer generated for child, this is the type specific (non-polymorphic) serializer. Then for any sealed parent (there could be multiple, hierarchic), this is always abstract and the descriptor contains the information of all the possible sub-serializers in the sealed hierarchy.

The child adding a discriminator on its own is hard (and certainly doesn't fit the current architecture). The problem is that the format is then unable to know that the (synthetic) property is actually a type discriminator. More significant however is that the serializer itself cannot know the context of it's own use (polymorphic or static).

It is perfectly possible for a format to add a discriminator for all structure types, it is also possible for it to create a set of typenames based upon the content of a SerializersModule (using the dumpTo function). This has performance drawbacks. As an alternative, it is also possible to "wrap" the serializers into remapping serializers that wrap all desired classes into a polymorphic/sealed wrapper. If you only care about the outer type, then it is even easier, just put it in a polymorphic (or sealed) parent when creating the serializer.

sandwwraith commented 3 years ago

While discussing POLYMORPHIC_SUBTYPES internally, we stumbled into the design problem of when this flag should be applicable. Problem described in initial ticket happens only on 'top-level' serialization: when you call call.response(data), data type is determined dynamically to a subtype, thus no class discriminator. However, data type can also contain other properties. Consider a serializable hierarchy:

@Serializable sealed class Base

@Serializable class Sub: Base()

@Serializable
data class Message(val b: Base, val s: Sub)

In this case, Message.b gets disriminator regardless of the settings. But we also have s: Sub property, which normally shouldn't get discrimnator. Whether it should get it or not with this new flag, it's an open question.

On one hand, POLYMORPHIC_SUBTYPES probably should affect all subclasses, not only top-level (encountered on serialization start) ones. On the other hand, it purely depends on the client on the other side: if we share such data class definition with Kotlin client on the other side, deserializer of Message wouldn't expect an additional 'type' property in s, resulting with an error about unknown field. Many JS frontends simply don't need this information, too.

Maybe we overthink things a bit here, and there is almost no real-world usages of sub class as a property, but this fact is yet to be validated. So we need to determine whether it is reasonable to provide type information only on top-level or in all properties of subtypes.

joffrey-bion commented 3 years ago

Oh, that is quite a valid concern.

Maybe we overthink things a bit here, and there is almost no real-world usages of sub class as a property, but this fact is yet to be validated.

I don't think it's overthinking. I find it quite reasonable to expect subtypes as property types, especially in the case of sealed hierarchies. For instance, you could imagine a sealed class User with subtypes Employee and Customer, and now you want to send an event that can only concern employees. You will likely have a property of type Employee in this event, because it's better to be specific if it can never be a Customer. I imagine there are many use cases like this, so it's worth thinking about that case IMO.

I believe that the initial case I made for POLYMORPHIC_SUBTYPES was in the context of sharing the Kotlin data model on both sides (because the discriminator is most likely only useful if you deserialize in Kotlin anyway). If both sides have the same model, then the main problem is that the producer of the message can't know if the receiver will have the same type information for the root type. Apart from that, I guess the current behaviour is sufficient for nested properties. With that in mind, I believe that adding the discriminator only on the serialization root should be sufficient.

If it were designed this way, maybe a better name might be POLYMORPHIC_TYPES_AND_SUBTYPES_AT_ROOT, but that's very verbose 😄 (maybe this configuration should be a sealed class itself, instead of an enum 😄)

I guess providing the option of always including the discriminator for these subtypes would be nice for consistency and understandability, but I wonder if there is an actual use case for this.

efenderbosch commented 3 years ago

I'd like to have the option to change the default strategy from using the FQCN (com.acme.foo.bar.MySerializableSubClass) to just the simple name (MySerializableSubClass). And an option to change the descriminator JSON attribute name w/ an annotation.

pdvrieze commented 3 years ago

@efenderbosch The discriminator is determined by the @SerialName annotation on the type/class. A strategy would have to take the form of some sort of policy that can be determined/plugged in by the user as any such policy is not globally valid.

efenderbosch commented 3 years ago

Right, a runtime configurable strategy is what I had in mind.

val myFormatter = Json {
    descriminatorStrategy = SimpleClassnameStrategy::class
}

ghost commented 2 years ago

Hey @sandwwraith, how about you just make JsonClassDiscriminator#discriminator nullable? This way we could do @JsonClassDiscriminator(discriminator = null) to disable the discriminator. I guess it's a solid solution that doesn't require years of discussion. Anyway, is there a workaround for now? This issue is a blocker for me.

emartynov commented 2 years ago

What if I have inconsistent BE and I need exactly to specify to have or not discriminator for complex nested classes?

emartynov commented 2 years ago

My use case.

I have a white label android application. It has a user profile that has different fields based on the specific label. I have common code that is calling the backend with endpoint and I wrote it as a function with parameter type BaseProfile. And in a specific label, the specific class LabelAProfile is passed to this function.

I annotated BaseProfile and LabelAProfile as @Serializable and I also wrote Json configuration like

                polymorphic(BaseProfile::class) {
                    subclass(LabelAProfile::class, LabelAProfile.serializer())
                }

LabelA back end doesn't need a discriminator and throws an error about the unknown type when it is included. So I need a way to hide it.

However, in the LabelAProfile I have also an insurance type that is

@Serializable
@JsonClassDiscriminator("insurance_type")
sealed class InsuranceInfo {
  @Serializable
  @SerialName("public")
  data class PublicInsurance(
     ...
  ) : InsuranceInfo()

  @Serializable
  @SerialName("self-paid")
  object SelfPaid: InsuranceInfo()
}

Here I actually need to pass a discriminator to the back end.

Dominaezzz commented 2 years ago

@emartynov For the BaseProfile case, you don't need polymorphic serialization, so you shouldn't use polymorphic builder. What you actually need is contextual serialisation. So context(BaseProfile::class, LabelAProfile.serializer()), you might need to do some casting.

werner77 commented 2 years ago

I would like to add to this discussion from an Open API compliance standpoint. See the polymorphism example here.

The problem with not always including the discriminator in the JSON is that it will no longer validate against the JSON schema since normally the discriminator property is marked as required in either the base class (included via allOf) or the sub classes so it cannot just be omitted.

I think it makes sense to at least be able to comply with the Open API spec, by providing an ALWAYS mode for class hierarchies.

MrThreepwood commented 2 years ago

I have a similar situation where I don't know the actual polymorphic hierarchy on serialization. The backend is serializing a NotAuthenticatedError, the client knows that this endpoint could potentially return the successful model or the NotAuthenticatedError and treats it as a sealed class, but the backend is throwing the exception before the controller is called and as such has no way to figure out the specific Implementation of NotAuthenticated it should serialize, so it just makes sure that the properties match the client's expectation, except that we need the class discriminator...

So I would also greatly appreciate the ability to always include the class discriminator.

I have no idea how to work around this issue at the moment. Maybe I can just add every type as a polymorphic serialization option for Any and serialize everything as Any? Not ideal, and I don't know that it works with Spring (which throws away any attempt to use kotlinx polymorphically).

kgbier commented 1 year ago

To add another valid usecase for some kind of "always discriminate" configuration, GeoJSON requires that all objects have a type field.

It would be very handy to leverage the in-built type discrimination to make encoding/decoding compliant to the spec.

To workaround this, I am adding an @EncodeDefault annotated type field and a matching @SerialName to each object, then using one json configuration to encode both a type and customType field, then using another configuration to decode only from the type field.

distinctdan commented 1 year ago

In my case, I'm trying to use a base class to define optional fields that can be sent to a logging API. Each section of the app may use a subset of these fields. For example, the HTTP module is likely to use the http logging fields, but those won't be used by any other module. It's beneficial to keep them all on a base interface to prevent modules from accidentally using reserved field names. I only need to serialize, not deserialize, so I don't care about polymorphism. However, this throws the error Class 'ModuleLog' is not registered for polymorphic serialization in the scope of 'DataDogBaseLog'.

It would be great if we could turn off polymorphism for certain use cases.

@Serializable
abstract class DataDogBaseLog {
    // Require a message.
    abstract val message: String

    // Apply a default timestampe here so that we don't have to include this in every subclass
    @Serializable(with = InstantSerializer::class)
    val date: Instant = Instant.now()

    // Use "Transient" to prevent this from being sent unless overridden.
    @Transient
    open val host: String = ""

    // other fields
}

In another module:

@Serializable
data class ModuleLog(
    // Reserved properties
    override val message: String,
    override val host: String,

    // Module-specific properties
    val prop1: String,
    val prop2: Int,
): DataDogBaseLog()

pdvrieze commented 1 year ago

@distinctdan The error message you get means that you have a variable of (static) type DataDogBaseLog, but are attempting to serialize a value of ModuleLog type. As the used serializer is statically derived the serializer chosen is that of the base class. Then, DataDogBaseLog is abstract so there is no "generated" serializer, instead the polymorphic serializer is used. The polymorphic serializer uses the serializersModule to determine the actual serializer to use.

If you want to make this work, it may work making DataDogBaseLog concrete, or provide a custom serializer. If you want to ignore polymorphism in serialization ModuleLog doesn't need to be @Serializable.

distinctdan commented 1 year ago

Thanks for the explanation @pdvrieze, that makes sense about the base being abstract. That's an interesting idea of making the child class ModuleLog not serializable and the base class concrete, but that still triggers the polymorphic error for me.

However, I have gotten inheritance to work by defining a custom polymorphic serializer that picks the base class. This produces the following behavior:

Type safety for child classes when they override properties.
Only properties overriden in child classes are serialized, no base class properties are serialized.

I haven't been able to get default values in the base class to be serialized, but this is good enough for me for now.

object DDBaseLogSerializer : JsonContentPolymorphicSerializer<DDBaseLog>(DDBaseLog::class) {
    override fun selectDeserializer(element: JsonElement) = DDBaseLog.serializer()
}

/**
 * Defines all the logging attributes that datadog supports. The open vals are marked as Transient
 * so that they won't be serialized unless overridden.
 */
@Serializable(with = DDBaseLogSerializer::class)
abstract class DDBaseLog {
    // Required props
    abstract val message: String
    abstract val date: Instant

    // Optional props
    open val host: String = ""
    open val service: String = ""
    // ... other props
}

@Serializable()
class ModuleLog(
    override val message: String,

    @Serializable(with = InstantSerializer::class)
    override val date: Instant = Instant.now(),

    val moduleProp1: String = "value",
): DDBaseLog()

sandwwraith commented 11 months ago

There's a prototype for this feature in #2450; feel free to review it. The current problem is that implementing POLYMORPHIC_AND_SUBTYPES requires changes in the descriptors and plugin, producing incorrect results when classes compiled with the previous serialization plugin are serialized in this mode. We're investigating what can be done here; however, other modes don't have this problem.

Ayfri commented 8 months ago

Any update ?

sandwwraith commented 8 months ago

@Ayfri See comments in the linked PR.

Ayfri commented 8 months ago

So if I understand correctly, it will be fixed by the next release as the pull request is already merged ?

sandwwraith commented 8 months ago

Mostly, yes. POLYMORPHIC_AND_SUBTYPES is still in development (there are some new concerns regarding sealed hierarchies with custom serializers), but other modes will be available in the next release.

Ayfri commented 8 months ago

Good to know ! Is there any milestone for the next release ?

dsvoronin commented 7 months ago

Good to know ! Is there any milestone for the next release ?

Seems it has been released in https://github.com/Kotlin/kotlinx.serialization/releases/tag/v1.6.3 🎉

mgroth0 commented 1 month ago

Is this still the thread to follow for POLYMORPHIC_AND_SUBTYPES, or has that been split into another? I am looking forward to that feature.

Kotlin / kotlinx.serialization

Allow custom policy for adding the polymorphic discriminator #1247