apple / swift-protobuf

Plugin and runtime library for using protobuf with Swift
Apache License 2.0
4.57k stars 449 forks source link

JSON Serialization with default value #861

Open seanliu1 opened 5 years ago

seanliu1 commented 5 years ago

Developing using iOS12, Swift 5, proto3 . I am about to add an extension which can support to output fields with their default values. I just want to check whether it is already implemented.

Based on proto doc, it looks like

JSON options
A proto3 JSON implementation may provide the following options:

Emit fields with default values: Fields with default values are omitted by default in proto3 JSON output. An implementation may provide an option to override this behavior and output fields with their default values.

I wonder does swift version has option to output fileds with their default values. I found python version has it MessageToJson(message, including_default_value_fields=False)

https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.json_format-module

thomasvl commented 11 months ago

Actually, we also need to double check how other languages do this option with respect to field presence. i.e. - if the field has presence, does the flag actually do something, or is the flag only honored when the field doesn't have presence?

thomasvl commented 11 months ago

One other through - you might be able to get some generated code size savings by doubling Subtype case, and having a version of each with and without a default value. i.e. - if the default value is zero/empty string/empty bytes, use the case without a default and just make make the code interfacing deal accordingly. Since zero is the majority common case, it can shrink things a fair amount.

mrabiciu commented 11 months ago

I did a bunch of performance experimentation over the weekend and here are some findings:

With these in mind this is what I've pivoted to:

internal protocol Field<M> {
    associatedtype M: Message
    func traverse<V: Visitor>(message: M, visitor: inout V) throws
}

fileprivate struct SingularInt32Field<M: Message>: Field {
    private let fieldNumber: Int
    private let getValue: (M) -> Int32

    func traverse<V: Visitor>(message: M, visitor: inout V) throws {
        try visitor.visitSingularInt32Field(value: getValue(message), fieldNumber: fieldNumber)
    }
}

public struct FieldNode<M: Message> {
    private let field: any Field<M>
    private let isDefault: (M) -> Bool

    internal func traverse<V: Visitor>(message: M, using visitor: inout V) throws {
        if !isDefault(message) {
            try field.traverse(message: message, visitor: &visitor)
        }
    }

    public static func singularInt32(_ getValue: @escaping (M) -> Int32, fieldNumber: Int, defaultValue: Int32 = 0) -> Self {
        Self(field: SingularInt32Field(fieldNumber: fieldNumber, getValue: getValue), isDefault: { getValue($0) == defaultValue })
    }
}

extension Message {
    public func traverse<V: Visitor>(visitor inout: V) throws {
        for node in Self.nodes {
            node.traverse(message: self, visitor: &visitor)
        }
    }
}

Generated code:

extension SomeProto {
    static let fieldNodes: [FieldNode<Self>] = [
        .singularInt32({ $0.someInt32 }, fieldNumber: 1),
    ]
}

Performance

I've been measuring performance by generating a proto with one of every kind of field and encoding it to both binary and json formats, in a cli I built using the release configuration. This isn't that scientific but it gives us a ballpark estimation of the performance loss

Method Performance
Binary encode, all fields are unset ~6x slower
Binary encode, all fields are set ~1.3x slower
Json encode, all fields are unset ~1.8x slower
Json encode, all fields are set no difference

I'm still trying to optimize the "binary encode, all fields are unset" case and I'm open to suggestions. Its kind of hard to compete with the status quo since its an inlined function that effectively no-ops in the status quo while we still need to iterate over the array of nodes and dispatch some calls in my proposed implementation.

thomasvl commented 11 months ago

Are those initial numbers debug or release? And how is performance compare in the other? i.e. - how much slower is debug how much slower is release?

mrabiciu commented 11 months ago

Those measurements are taken in a release build.

Here are the same measurements in a debug build:

Method Performance
Binary encode, all fields are unset ~7x slower
Binary encode, all fields are set ~1.2x slower
Json encode, all fields are unset ~2.7x slower
Json encode, all fields are set no difference
mrabiciu commented 11 months ago

Update: I had a mistake in my logic that inverted the check to visit nested messages

Here is a more accurate performance measurement:

Release

Method Performance
Binary encode, all fields are unset ~1.5x - 2x slower
Binary encode, all fields are set ~1 - 1.2x slower
Json encode, all fields are unset ~1 - 1.3 x slower
Json encode, all fields are set no difference

Debug

Method Performance
Binary encode, all fields are unset ~3x slower
Binary encode, all fields are set ~1.2x slower
Json encode, all fields are unset ~2x slower
Json encode, all fields are set ~1.1x slower

I think this is reasonable point to pause runtime performance optimization, especially since if we put name information into Field we can probably make json encoding faster than status quo.

I'm going to shift my focus to measuring the bundle size impact and seeing how much that can be optimized.

mrabiciu commented 11 months ago

I did some testing last night on the impact of my approach on the size of the binary and unfortunately this approach increases the binary size by about 10% rather than decrease. I think this is happening because Field<M> and all the FieldItem<M> types are being reified for every message type resulting in lots of symbols. I'm going try to see if a less safe but type-erased version of Field can work.

mrabiciu commented 11 months ago

Do you have any advice for measuring bundle size impact? I've tried a few things now and I'm getting either inconsistent or unexpected results. For example I experimented with dropping the _ProtoNameProviding conformance from message generation and saw no impact on my binary size which doesn't make sense to me.

So far what I've been doing is creating a macOS cli that depends on my fork of SwiftPrototobuf and generating 100 messages with 50 fields each that I embed in the cli. Then I archive that and look at the resulting binary size.

mrabiciu commented 11 months ago

Here is a PR with what I've been working on https://github.com/apple/swift-protobuf/pull/1504

antongrbin commented 11 months ago

Actually, we also need to double check how other languages do this option with respect to field presence. i.e. - if the field has presence, does the flag actually do something, or is the flag only honored when the field doesn't have presence?

Based on my understanding of the spec, this flag should be ignored for fields with presence.

When generating JSON-encoded output from a protocol buffer, if a protobuf field has the default value and if the field doesn’t support field presence, it will be omitted from the output by default. An implementation may provide options to include fields with default values in the output.

These two implementations use has presence explicitly when checking the value of the flag:

I believe these implementations are equivalent, but it's harder to read this out (proto3 optional is implemented as oneof):