Kotlin / kotlinx.serialization

Kotlin multiplatform / multi-format serialization
Apache License 2.0
5.41k stars 620 forks source link

Make skipping null values default behavior for protobuf #397

Closed Egorand closed 4 years ago

Egorand commented 5 years ago

Based on #58.

My understanding is that since 0.10.0 there is support for skipping optional values, which can be enabled for JSON with Json(encodeDefaults = false), but it's not clear how to enable this behavior for ProtoBuf. In my opinion this should also be the default behavior for protobuf, since the only way to transfer null values is to omit them in the binary representation.

sandwwraith commented 5 years ago

Protobuf case can be more complicated than JSON case since JSON can distinguish between null and absence of value, and protobuf not. Moreover, it does not have a concept of null itself.

Imagine you deserialize data class with Protobuf and it has field val x: MyData? = MyData.DEFAULT_VALUE. You don't see x in the input stream. What does it mean? If the other side was using kotlinx.serialization too looks like you have to use the default value. If other side using other protobuf implementation, should you use null? Or construct MyData instance with zeroed fields, as the spec says?

There is also a problem with val x: Int = 42. Usually, protobuf implementations set missing values to 0. But it conflicts with our desire to skip 42 in output stream: If we omit it, other clients will receive incorrect data.

Seems like the proper solution is to skip default values only if they're nulls or 0s. We'll try to investigate what can we do here and what consequences it can bring.

Egorand commented 5 years ago

Unlike JSON, protobuf relies on explicitly defined schema, hence I believe some of the uncertainties you mention can be resolved by looking at the schema.

Let's take the following message:

message Employee {
  int32 id = 1;
  optional string name = 2;
  optional bool is_contractor = 3 [default = false];
}

With this schema, message recipients should always expect id value to be present in the message, while name and is_contractor can be omitted. When name is missing its value defaults to null, since no other default has been set. When is_contractor is missing it gets the value of false, which also implies that is_contractor can never be assigned a value of null. Hence, the following would be the correct way to represent Employee as a @Serializable data class:

data class Employee(
  @SerialId(1) val id: Int,
  @SerialId(2) @Optional val name: String? = null,
  @SerialId(3) @Optional val is_contractor: Boolean = false
)

Also, since protobuf always has schema, the code is usually generated, and the generator is responsible for ensuring that correct semantics are preserved for all fields. For instance, val x: MyData? = MyData.DEFAULT_VALUE would be incorrect, since x can either be a field with a default value, which would it make it non-nullable, or a simple optional field, meaning its default value should be null.

To sum up, the deserializer should always rely on the default value of a field if it's missing on the wire, and the code generator is responsible for properly translating proto messages into data classes, preserving the semantics of individual fields.

jamesachn commented 5 years ago

For instance, val x: MyData? = MyData.DEFAULT_VALUE would be incorrect, since x can either be a field with a default value, which would it make it non-nullable, or a simple optional field, meaning its default value should be null.

What about the case:

optional string name = 2 [default = "default_value"]

with

// Omitted @Optional since that is deprecated now for properties with default value.
@SerialId(2) val name: String = "default_value"

In this case when receiving an input stream without name in the payload, the expectation I think is to deserialize with name set to "default_value". I think this is fairly unambiguous as protobuf spec does not seem to enforce that sender and receiver even share the same default value:

Reference: https://developers.google.com/protocol-buffers/docs/proto#optional

When a message is parsed, if it does not contain an optional element, the corresponding field in the parsed object is set to the default value for that field.

It may induce questionable versioning story, but that's really dependent on the application usage of protobuf and is really application's own problem.

On the flip side (serialize for a class with optional field and default value), I think the enhancement behavior, if considered, can extend beyond just skipping if default value is null. If there is a default value, then the field is considered optional to kotlinx.serialization. Why restrict to just skipping null default values? As long as the object instance's optional field has a value that == the default value, it should be considered skippable. It is the receiver's job to decide how to deal with the lack of value for the optional field. If receiver is proto2, it will use its schema's default value. If receiver is proto3, it will use default value for that field type. If receiver is kotlinx.serialization, it will use the class definition's declared default value for that field. This seems consistent with how protobuf treats optional fields with respect to default values (i.e. if optional field is not in the payload, receiver fills in the blank according to its default-value behavior).