Closed Egorand closed 4 years ago
Protobuf case can be more complicated than JSON case since JSON can distinguish between null
and absence of value, and protobuf not. Moreover, it does not have a concept of null
itself.
Imagine you deserialize data class with Protobuf and it has field val x: MyData? = MyData.DEFAULT_VALUE
. You don't see x
in the input stream. What does it mean? If the other side was using kotlinx.serialization too looks like you have to use the default value. If other side using other protobuf implementation, should you use null
? Or construct MyData
instance with zeroed fields, as the spec says?
There is also a problem with val x: Int = 42
. Usually, protobuf implementations set missing values to 0. But it conflicts with our desire to skip 42
in output stream: If we omit it, other clients will receive incorrect data.
Seems like the proper solution is to skip default values only if they're null
s or 0
s. We'll try to investigate what can we do here and what consequences it can bring.
Unlike JSON, protobuf relies on explicitly defined schema, hence I believe some of the uncertainties you mention can be resolved by looking at the schema.
Let's take the following message:
message Employee {
int32 id = 1;
optional string name = 2;
optional bool is_contractor = 3 [default = false];
}
With this schema, message recipients should always expect id
value to be present in the message, while name
and is_contractor
can be omitted. When name
is missing its value defaults to null, since no other default has been set. When is_contractor
is missing it gets the value of false, which also implies that is_contractor
can never be assigned a value of null. Hence, the following would be the correct way to represent Employee
as a @Serializable
data class:
data class Employee(
@SerialId(1) val id: Int,
@SerialId(2) @Optional val name: String? = null,
@SerialId(3) @Optional val is_contractor: Boolean = false
)
Also, since protobuf always has schema, the code is usually generated, and the generator is responsible for ensuring that correct semantics are preserved for all fields. For instance, val x: MyData? = MyData.DEFAULT_VALUE
would be incorrect, since x
can either be a field with a default value, which would it make it non-nullable, or a simple optional field, meaning its default value should be null.
To sum up, the deserializer should always rely on the default value of a field if it's missing on the wire, and the code generator is responsible for properly translating proto messages into data classes, preserving the semantics of individual fields.
For instance, val x: MyData? = MyData.DEFAULT_VALUE would be incorrect, since x can either be a field with a default value, which would it make it non-nullable, or a simple optional field, meaning its default value should be null.
What about the case:
optional string name = 2 [default = "default_value"]
with
// Omitted @Optional since that is deprecated now for properties with default value.
@SerialId(2) val name: String = "default_value"
In this case when receiving an input stream without name
in the payload, the expectation I think is to deserialize with name
set to "default_value"
. I think this is fairly unambiguous as protobuf spec does not seem to enforce that sender and receiver even share the same default value:
Reference: https://developers.google.com/protocol-buffers/docs/proto#optional
When a message is parsed, if it does not contain an optional element, the corresponding field in the parsed object is set to the default value for that field.
It may induce questionable versioning story, but that's really dependent on the application usage of protobuf and is really application's own problem.
On the flip side (serialize for a class with optional field and default value), I think the enhancement behavior, if considered, can extend beyond just skipping if default value is null. If there is a default value, then the field is considered optional to kotlinx.serialization. Why restrict to just skipping null default values? As long as the object instance's optional field has a value that ==
the default value, it should be considered skippable. It is the receiver's job to decide how to deal with the lack of value for the optional field. If receiver is proto2, it will use its schema's default value. If receiver is proto3, it will use default value for that field type. If receiver is kotlinx.serialization, it will use the class definition's declared default value for that field. This seems consistent with how protobuf treats optional fields with respect to default values (i.e. if optional field is not in the payload, receiver fills in the blank according to its default-value behavior).
Based on #58.
My understanding is that since
0.10.0
there is support for skipping optional values, which can be enabled for JSON withJson(encodeDefaults = false)
, but it's not clear how to enable this behavior forProtoBuf
. In my opinion this should also be the default behavior for protobuf, since the only way to transfer null values is to omit them in the binary representation.