Open joffrey-bion opened 3 years ago
Thank you for this excellent summary and aggregating a set of diverse yet related issues!
I just wanted to note that the current two ways polymorphism is supported and implemented (normal and array polymorphism) might complicate this suggestion; I haven't given it much thought, though, and I expect it is certainly something the smart people behind this library could tackle. ;p
For example, a recent issue I ran into is that adding a class discriminator to JSON primitives currently only works when array polymorphism is used. I had an Enum which extended from an interface and wanted to serialize this polymorphically, for which I had to create a custom serializer.
This is yet another reason why I still prefer kotlinx.serialization's original proposal to use array polymorphism by default, but I understand this deviates from 'the standard', if that even exists.
Any progress on it?
As for now, there is sufficient demand and use-cases to provide an option to globally disable type discriminator (NEVER option) — mainly, to interact with external API that has validation or just to reduce the amount of data. However, to correctly design and implement this feature we need to decide whether to provide other options besides NEVER.
POLYMORPHIC_SUBTYPES
, as in the proposal, is a questionable strategy because it makes rules harder to understand — besides static property type, we also need to check if the class is registered in the current module, etc, etc. Instead, we suggest providing a diagnostic when the user probably wants to use base polymorphic type instead of subtype: #1493
ALWAYS
also seems vague and rather exotic use-case: you don't really want an external API to depend on your implementation details, such as class serial names. Good implementation with backward compatibility of such a pattern requires a manual providing a @SerialName
for each serializable class of your program. Moreover, the current implementation of polymorphism does not allow to specify serial names and discriminators for primitives and enums; implementing ALWAYS strategy would be an inappropriate cheat-code to overcome this limitation — because, well, not all classes would have discriminators, despite ALWAYS state.
So for now we are leaning towards a design with only two states: DEFAULT (current behavior) and NEVER (do not include discriminator at all). If you have any additional suggestions or corrections, please add them
@sandwwraith Thanks for the summary of the state of this issue.
POLYMORPHIC_SUBTYPES
, as in the proposal, is a questionable strategy because it makes rules harder to understand
I actually think the opposite is true sometimes. The current way of serializing surprises some people because the same class will not be serialized the same way depending on whether the serializer used was that of the parent or that of the child class.
The proposal here is to ensure both serializers yield the same result for a given instance. So IMO this rule (which ensures consistent serialization of a given instance) is actually easier to understand than the current default behaviour.
Instead, we suggest providing a diagnostic when the user probably wants to use base polymorphic type instead of subtype: #1493
I understand that for most use cases, it's just a failure of users to understand how the serializer is chosen (based on the static type of the variable at the call site). For these cases https://github.com/Kotlin/kotlinx.serialization/issues/1493 would be sufficient.
However, there are some use cases where the call site is not controlled, and for those it's not about fixing the call site, it's just that some frameworks (e.g. Spring) have a generic API surface on top of different serialization libraries, and they don't have any static type information at the moment they call the concrete serialization library, just the runtime type of the instances to serialize.
For example, anywhere in the code, you could have a call like this:
simpMessagingTemplate.convertAndSend("/some/destination", someObject)
This public API is generic for Spring, regardless of the serialization library used. Once deep down in Spring implementation, the runtime type of someObject
is introspected and Kotlinx Serialization is called then, without static info.
A good suggestion by @pdvrieze to work around this issue on Spring side without changing their public API is to wrap the serialized object into an internal wrapper class known to Spring with additional static type information (e.g. the serializer), and unwrap this in the Kotlinx Serialization adapter.
besides static property type, we also need to check if the class is registered in the current module, etc, etc
I would have expected something at compiler-plugin level, because parent classes annotated @Serializable
are known at compile time. So I thought we could maybe directly generate a serializer with discriminator in the child class itself. Is there dynamic registration of polymorphic types that could prevent this from being implemented this way?
The point about Spring and other frameworks is a really good one, thanks. Even Ktor has the same problem. It's still possible to use convertAndSend<BaseClass>("/some/destination", someObject)
, but this is a very easy mistake to make, even with inspection (and I'm not sure if the inspection can tell at all whether the usage of subclass was intended or accidental).
parent classes annotated @Serializable are known at compile time
This is true indeed. However, there is still a 'dynamic registration of polymorphic types' — a SerializersModule
, because we want to make sure that only classes that are intended to be polymorphic can be deserialized. (applicable to abstract classes, sealed classes work automatically) There's a bit of motivation about that in the KEEP.
we could maybe directly generate a serializer with a discriminator in the child class itself.
This is an implementation detail: the discriminator is added by the format, not by the serializer. So we need to keep a 'default' serializer for child and do not use PolymorphicSerializer for it because this will lead to a stack overflow
It's still possible to use
convertAndSend<BaseClass>("/some/destination", someObject)
I'm not sure what you mean here. At least as far as this Spring function is concerned, this is not correct. This method doesn't have a type parameter. Spring entirely relies on the runtime type of the 2nd argument, once this "message payload" hits the actual message converter that will be used for serialization. This call site is long forgotten by then.
The only way such a call could be made this way is if we defined an extension function with a reified type parameter, and invent a wrapper class that wraps the payload instance + the serializer to use. And then modify Spring's code in their Kotlinx Serialization adapter to detect this wrapper class, unwrap it, and read the serializer from there. That being said, I still consider it only as a workaround, because of all the unnecessary allocations of wrapper objects. If these messages are handled by the million, it could be a non-negligible overhead.
this is a very easy mistake to make, even with inspection (and I'm not sure if the inspection can tell at all whether the usage of subclass was intended or accidental).
True, even building a reified version of such functions via the workaround mentioned above would actually not entirely solve the problem. Deserializing polymorphic instances on the other end of the communication should (IMO) not really depend on whether the program that did the serialization was using a polymorphic static type or a concrete subtype. I understand why this happens this way at the moment, but I feel like it would be nice to allow to customize this behaviour.
applicable to abstract classes, sealed classes work automatically
I see. I have only ever needed sealed
classes for my data model, never abstract
classes, that's why I never needed to register anything I guess. I'm mostly interested in making the sealed classes use case work out of the box, since this is the use case I see the most often in real life.
This is an implementation detail: the discriminator is added by the format, not by the serializer.
Isn't the PolymorphicSerializer
deciding to add a discriminator independently of the format? My (admittedly limited) understanding of the situation is the following: when I have a sealed class Parent
and a child class Child
, the compiler plugin generates a regular serializer for the Child
class, and a PolymorphicSerializer
for the Parent
which decides to include a discriminator + delegates to the child. The format then decides how to include the discriminator (e.g. JSON property "type" + fully qualified class name as value).
Based on this very limited and maybe incorrect understanding, the point I was making above was that the Parent
could instead get a serializer that "just" delegates to the child's, and the child's serializer could be the one deciding the add a discriminator.
This is an implementation detail: the discriminator is added by the format, not by the serializer. So we need to keep a 'default' serializer for child and do not use PolymorphicSerializer for it because this will lead to a stack overflow
Just to clarify for the others, polymorphic (de)serialization, the (outer) polymorphic serializer that has 2 elements (the type discriminator, and the data element). The data element is then (de)serialized using the type specific serializer. The format (json when not in array format, xml in transparent mode) can elide this extra structure and use a different approach instead (type discriminator, type specific tag name).
Basically it works as follows:
As such, without having a polymorphic serializer a format does not know how a type is used, or even what the base type is (there could be multiple valid parent types for the same polymorphic leaf type).
For sealed serialization, there is only one serializer generated for child, this is the type specific (non-polymorphic) serializer. Then for any sealed parent (there could be multiple, hierarchic), this is always abstract and the descriptor contains the information of all the possible sub-serializers in the sealed hierarchy.
The child adding a discriminator on its own is hard (and certainly doesn't fit the current architecture). The problem is that the format is then unable to know that the (synthetic) property is actually a type discriminator. More significant however is that the serializer itself cannot know the context of it's own use (polymorphic or static).
It is perfectly possible for a format to add a discriminator for all structure types, it is also possible for it to create a set of typenames based upon the content of a SerializersModule (using the dumpTo function). This has performance drawbacks. As an alternative, it is also possible to "wrap" the serializers into remapping serializers that wrap all desired classes into a polymorphic/sealed wrapper. If you only care about the outer type, then it is even easier, just put it in a polymorphic (or sealed) parent when creating the serializer.
While discussing POLYMORPHIC_SUBTYPES
internally, we stumbled into the design problem of when this flag should be applicable. Problem described in initial ticket happens only on 'top-level' serialization: when you call call.response(data)
, data
type is determined dynamically to a subtype, thus no class discriminator. However, data
type can also contain other properties. Consider a serializable hierarchy:
@Serializable sealed class Base
@Serializable class Sub: Base()
@Serializable
data class Message(val b: Base, val s: Sub)
In this case, Message.b
gets disriminator regardless of the settings. But we also have s: Sub
property, which normally shouldn't get discrimnator. Whether it should get it or not with this new flag, it's an open question.
On one hand, POLYMORPHIC_SUBTYPES
probably should affect all subclasses, not only top-level (encountered on serialization start) ones.
On the other hand, it purely depends on the client on the other side: if we share such data class definition with Kotlin client on the other side, deserializer of Message
wouldn't expect an additional 'type' property in s
, resulting with an error about unknown field. Many JS frontends simply don't need this information, too.
Maybe we overthink things a bit here, and there is almost no real-world usages of sub class as a property, but this fact is yet to be validated. So we need to determine whether it is reasonable to provide type information only on top-level or in all properties of subtypes.
Oh, that is quite a valid concern.
Maybe we overthink things a bit here, and there is almost no real-world usages of sub class as a property, but this fact is yet to be validated.
I don't think it's overthinking. I find it quite reasonable to expect subtypes as property types, especially in the case of sealed hierarchies. For instance, you could imagine a sealed class User
with subtypes Employee
and Customer
, and now you want to send an event that can only concern employees. You will likely have a property of type Employee
in this event, because it's better to be specific if it can never be a Customer
. I imagine there are many use cases like this, so it's worth thinking about that case IMO.
I believe that the initial case I made for POLYMORPHIC_SUBTYPES
was in the context of sharing the Kotlin data model on both sides (because the discriminator is most likely only useful if you deserialize in Kotlin anyway). If both sides have the same model, then the main problem is that the producer of the message can't know if the receiver will have the same type information for the root type. Apart from that, I guess the current behaviour is sufficient for nested properties. With that in mind, I believe that adding the discriminator only on the serialization root should be sufficient.
If it were designed this way, maybe a better name might be POLYMORPHIC_TYPES_AND_SUBTYPES_AT_ROOT
, but that's very verbose 😄 (maybe this configuration should be a sealed class itself, instead of an enum 😄)
I guess providing the option of always including the discriminator for these subtypes would be nice for consistency and understandability, but I wonder if there is an actual use case for this.
I'd like to have the option to change the default strategy from using the FQCN (com.acme.foo.bar.MySerializableSubClass
) to just the simple name (MySerializableSubClass
). And an option to change the descriminator JSON attribute name w/ an annotation.
@efenderbosch The discriminator is determined by the @SerialName
annotation on the type/class. A strategy would have to take the form of some sort of policy that can be determined/plugged in by the user as any such policy is not globally valid.
Right, a runtime configurable strategy is what I had in mind.
val myFormatter = Json {
descriminatorStrategy = SimpleClassnameStrategy::class
}
Hey @sandwwraith, how about you just make JsonClassDiscriminator#discriminator
nullable? This way we could do @JsonClassDiscriminator(discriminator = null)
to disable the discriminator. I guess it's a solid solution that doesn't require years of discussion.
Anyway, is there a workaround for now? This issue is a blocker for me.
What if I have inconsistent BE and I need exactly to specify to have or not discriminator for complex nested classes?
My use case.
I have a white label android application. It has a user profile that has different fields based on the specific label.
I have common code that is calling the backend with endpoint and I wrote it as a function with parameter type BaseProfile
. And in a specific label, the specific class LabelAProfile
is passed to this function.
I annotated BaseProfile
and LabelAProfile
as @Serializable
and I also wrote Json configuration like
polymorphic(BaseProfile::class) {
subclass(LabelAProfile::class, LabelAProfile.serializer())
}
LabelA back end doesn't need a discriminator and throws an error about the unknown type when it is included. So I need a way to hide it.
However, in the LabelAProfile
I have also an insurance type that is
@Serializable
@JsonClassDiscriminator("insurance_type")
sealed class InsuranceInfo {
@Serializable
@SerialName("public")
data class PublicInsurance(
...
) : InsuranceInfo()
@Serializable
@SerialName("self-paid")
object SelfPaid: InsuranceInfo()
}
Here I actually need to pass a discriminator to the back end.
@emartynov For the BaseProfile
case, you don't need polymorphic serialization, so you shouldn't use polymorphic
builder. What you actually need is contextual serialisation. So context(BaseProfile::class, LabelAProfile.serializer())
, you might need to do some casting.
I would like to add to this discussion from an Open API compliance standpoint. See the polymorphism example here.
The problem with not always including the discriminator in the JSON is that it will no longer validate against the JSON schema since normally the discriminator property is marked as required in either the base class (included via allOf
) or the sub classes so it cannot just be omitted.
I think it makes sense to at least be able to comply with the Open API spec, by providing an ALWAYS mode for class hierarchies.
I have a similar situation where I don't know the actual polymorphic hierarchy on serialization. The backend is serializing a NotAuthenticatedError, the client knows that this endpoint could potentially return the successful model or the NotAuthenticatedError and treats it as a sealed class, but the backend is throwing the exception before the controller is called and as such has no way to figure out the specific Implementation of NotAuthenticated it should serialize, so it just makes sure that the properties match the client's expectation, except that we need the class discriminator...
So I would also greatly appreciate the ability to always include the class discriminator.
I have no idea how to work around this issue at the moment. Maybe I can just add every type as a polymorphic serialization option for Any and serialize everything as Any? Not ideal, and I don't know that it works with Spring (which throws away any attempt to use kotlinx polymorphically).
To add another valid usecase for some kind of "always discriminate" configuration, GeoJSON requires that all objects have a type
field.
It would be very handy to leverage the in-built type discrimination to make encoding/decoding compliant to the spec.
To workaround this, I am adding an @EncodeDefault
annotated type
field and a matching @SerialName
to each object, then using one json configuration to encode both a type
and customType
field, then using another configuration to decode only from the type
field.
In my case, I'm trying to use a base class to define optional fields that can be sent to a logging API. Each section of the app may use a subset of these fields. For example, the HTTP module is likely to use the http logging fields, but those won't be used by any other module. It's beneficial to keep them all on a base interface to prevent modules from accidentally using reserved field names. I only need to serialize, not deserialize, so I don't care about polymorphism. However, this throws the error Class 'ModuleLog' is not registered for polymorphic serialization in the scope of 'DataDogBaseLog'
.
It would be great if we could turn off polymorphism for certain use cases.
@Serializable
abstract class DataDogBaseLog {
// Require a message.
abstract val message: String
// Apply a default timestampe here so that we don't have to include this in every subclass
@Serializable(with = InstantSerializer::class)
val date: Instant = Instant.now()
// Use "Transient" to prevent this from being sent unless overridden.
@Transient
open val host: String = ""
// other fields
}
In another module:
@Serializable
data class ModuleLog(
// Reserved properties
override val message: String,
override val host: String,
// Module-specific properties
val prop1: String,
val prop2: Int,
): DataDogBaseLog()
@distinctdan The error message you get means that you have a variable of (static) type DataDogBaseLog
, but are attempting to serialize a value of ModuleLog
type. As the used serializer is statically derived the serializer chosen is that of the base class. Then, DataDogBaseLog
is abstract so there is no "generated" serializer, instead the polymorphic serializer is used. The polymorphic serializer uses the serializersModule to determine the actual serializer to use.
If you want to make this work, it may work making DataDogBaseLog
concrete, or provide a custom serializer. If you want to ignore polymorphism in serialization ModuleLog doesn't need to be @Serializable
.
Thanks for the explanation @pdvrieze, that makes sense about the base being abstract. That's an interesting idea of making the child class ModuleLog
not serializable and the base class concrete, but that still triggers the polymorphic error for me.
However, I have gotten inheritance to work by defining a custom polymorphic serializer that picks the base class. This produces the following behavior:
I haven't been able to get default values in the base class to be serialized, but this is good enough for me for now.
object DDBaseLogSerializer : JsonContentPolymorphicSerializer<DDBaseLog>(DDBaseLog::class) {
override fun selectDeserializer(element: JsonElement) = DDBaseLog.serializer()
}
/**
* Defines all the logging attributes that datadog supports. The open vals are marked as Transient
* so that they won't be serialized unless overridden.
*/
@Serializable(with = DDBaseLogSerializer::class)
abstract class DDBaseLog {
// Required props
abstract val message: String
abstract val date: Instant
// Optional props
open val host: String = ""
open val service: String = ""
// ... other props
}
@Serializable()
class ModuleLog(
override val message: String,
@Serializable(with = InstantSerializer::class)
override val date: Instant = Instant.now(),
val moduleProp1: String = "value",
): DDBaseLog()
There's a prototype for this feature in #2450; feel free to review it. The current problem is that implementing POLYMORPHIC_AND_SUBTYPES
requires changes in the descriptors and plugin, producing incorrect results when classes compiled with the previous serialization plugin are serialized in this mode. We're investigating what can be done here; however, other modes don't have this problem.
Any update ?
@Ayfri See comments in the linked PR.
So if I understand correctly, it will be fixed by the next release as the pull request is already merged ?
Mostly, yes. POLYMORPHIC_AND_SUBTYPES
is still in development (there are some new concerns regarding sealed hierarchies with custom serializers), but other modes will be available in the next release.
Good to know ! Is there any milestone for the next release ?
Good to know ! Is there any milestone for the next release ?
Seems it has been released in https://github.com/Kotlin/kotlinx.serialization/releases/tag/v1.6.3 🎉
Is this still the thread to follow for POLYMORPHIC_AND_SUBTYPES
, or has that been split into another? I am looking forward to that feature.
What is your use-case and why do you need this feature?
There has been several use cases mentioned in different issues that require to add the discriminator in more places, or avoid it in some places:
Describe the solution you'd like
It seems these use cases could be supported by having a configuration parameter for the inclusion of the discriminator, such as
discriminatorInclusionPolicy
, with the following values:NEVER
: never include the discriminatorPOLYMORPHIC_TYPES
(current behaviour): include the discriminator only when serializing a type that's polymorphic itself (the "parent" type).POLYMORPHIC_SUBTYPES
: always include the discriminator for instances (subtypes) of polymorphic types that are serializable, regardless of whether they are used as their parent or as themselves (json.encodeToString<Parent>(SubType())
andjson.encodeToString<SubType>(SubType())
would behave the same way).ALWAYS
,ALL_TYPES
, orALL_OBJECT_TYPES
: always include the discriminator in types serialized as JSON objects (does not apply to primitive types)Maybe the names could be refined, but I think you get the idea.
Another option could also be to provide a custom function that tells whether to include the discriminator or not, but then we would have to think about which pieces of information the function would need etc.
Any thoughts on this?