Open seanliu1 opened 5 years ago
Actually, we also need to double check how other languages do this option with respect to field presence. i.e. - if the field has presence, does the flag actually do something, or is the flag only honored when the field doesn't have presence?
One other through - you might be able to get some generated code size savings by doubling Subtype
case, and having a version of each with and without a default value. i.e. - if the default value is zero/empty string/empty bytes, use the case without a default and just make make the code interfacing deal accordingly. Since zero is the majority common case, it can shrink things a fair amount.
I did a bunch of performance experimentation over the weekend and here are some findings:
With these in mind this is what I've pivoted to:
internal protocol Field<M> {
associatedtype M: Message
func traverse<V: Visitor>(message: M, visitor: inout V) throws
}
fileprivate struct SingularInt32Field<M: Message>: Field {
private let fieldNumber: Int
private let getValue: (M) -> Int32
func traverse<V: Visitor>(message: M, visitor: inout V) throws {
try visitor.visitSingularInt32Field(value: getValue(message), fieldNumber: fieldNumber)
}
}
public struct FieldNode<M: Message> {
private let field: any Field<M>
private let isDefault: (M) -> Bool
internal func traverse<V: Visitor>(message: M, using visitor: inout V) throws {
if !isDefault(message) {
try field.traverse(message: message, visitor: &visitor)
}
}
public static func singularInt32(_ getValue: @escaping (M) -> Int32, fieldNumber: Int, defaultValue: Int32 = 0) -> Self {
Self(field: SingularInt32Field(fieldNumber: fieldNumber, getValue: getValue), isDefault: { getValue($0) == defaultValue })
}
}
extension Message {
public func traverse<V: Visitor>(visitor inout: V) throws {
for node in Self.nodes {
node.traverse(message: self, visitor: &visitor)
}
}
}
Generated code:
extension SomeProto {
static let fieldNodes: [FieldNode<Self>] = [
.singularInt32({ $0.someInt32 }, fieldNumber: 1),
]
}
I've been measuring performance by generating a proto with one of every kind of field and encoding it to both binary and json formats, in a cli I built using the release configuration. This isn't that scientific but it gives us a ballpark estimation of the performance loss
Method | Performance |
---|---|
Binary encode, all fields are unset | ~6x slower |
Binary encode, all fields are set | ~1.3x slower |
Json encode, all fields are unset | ~1.8x slower |
Json encode, all fields are set | no difference |
I'm still trying to optimize the "binary encode, all fields are unset" case and I'm open to suggestions. Its kind of hard to compete with the status quo since its an inlined function that effectively no-ops in the status quo while we still need to iterate over the array of nodes and dispatch some calls in my proposed implementation.
Are those initial numbers debug or release? And how is performance compare in the other? i.e. - how much slower is debug how much slower is release?
Those measurements are taken in a release build.
Here are the same measurements in a debug build:
Method | Performance |
---|---|
Binary encode, all fields are unset | ~7x slower |
Binary encode, all fields are set | ~1.2x slower |
Json encode, all fields are unset | ~2.7x slower |
Json encode, all fields are set | no difference |
Update: I had a mistake in my logic that inverted the check to visit nested messages
Here is a more accurate performance measurement:
Method | Performance |
---|---|
Binary encode, all fields are unset | ~1.5x - 2x slower |
Binary encode, all fields are set | ~1 - 1.2x slower |
Json encode, all fields are unset | ~1 - 1.3 x slower |
Json encode, all fields are set | no difference |
Method | Performance |
---|---|
Binary encode, all fields are unset | ~3x slower |
Binary encode, all fields are set | ~1.2x slower |
Json encode, all fields are unset | ~2x slower |
Json encode, all fields are set | ~1.1x slower |
I think this is reasonable point to pause runtime performance optimization, especially since if we put name information into Field we can probably make json encoding faster than status quo.
I'm going to shift my focus to measuring the bundle size impact and seeing how much that can be optimized.
I did some testing last night on the impact of my approach on the size of the binary and unfortunately this approach increases the binary size by about 10% rather than decrease. I think this is happening because Field<M>
and all the FieldItem<M>
types are being reified for every message type resulting in lots of symbols. I'm going try to see if a less safe but type-erased version of Field
can work.
Do you have any advice for measuring bundle size impact? I've tried a few things now and I'm getting either inconsistent or unexpected results. For example I experimented with dropping the _ProtoNameProviding
conformance from message generation and saw no impact on my binary size which doesn't make sense to me.
So far what I've been doing is creating a macOS cli that depends on my fork of SwiftPrototobuf
and generating 100 messages with 50 fields each that I embed in the cli. Then I archive that and look at the resulting binary size.
Here is a PR with what I've been working on https://github.com/apple/swift-protobuf/pull/1504
Actually, we also need to double check how other languages do this option with respect to field presence. i.e. - if the field has presence, does the flag actually do something, or is the flag only honored when the field doesn't have presence?
Based on my understanding of the spec, this flag should be ignored for fields with presence.
When generating JSON-encoded output from a protocol buffer, if a protobuf field has the default value and if the field doesn’t support field presence, it will be omitted from the output by default. An implementation may provide options to include fields with default values in the output.
These two implementations use has presence
explicitly when checking the value of the flag:
FormatDefaultValues
)outputDefaultValues
)I believe these implementations are equivalent, but it's harder to read this out (proto3 optional is implemented as oneof):
alwaysOutputDefaultValueFields
) always_print_primitive_fields
)including_default_value_fields
)
Developing using iOS12, Swift 5, proto3 . I am about to add an extension which can support to output fields with their default values. I just want to check whether it is already implemented.
Based on proto doc, it looks like
I wonder does swift version has option to output fileds with their default values. I found python version has it
MessageToJson(message, including_default_value_fields=False)
https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.json_format-module