Open seanliu1 opened 5 years ago
The supported JSON encoding options: https://github.com/apple/swift-protobuf/blob/master/Sources/SwiftProtobuf/JSONEncodingOptions.swift
p.s. - adding this as an option would be tricky since JSON encoding is built on the visitor pattern which is where it is skipping default values.
@thomasvl thanks for your quick reply. Yes, I realized that default value is skipped in the generated code.
func traverse<V: SwiftProtobuf.Visitor>(visitor: inout V) throws {
if !self.cards.isEmpty {
try visitor.visitRepeatedMessageField(value: self.homeCards, fieldNumber: 1)
}
if self.test != false {
try visitor.visitSingularBoolField(value: self.test, fieldNumber: 2)
}
try unknownFields.traverse(visitor: &visitor)
}
Is there a way to do this without breaking the API between generated code and the library?
I see how old generated code could continue to work with an updated library, but I don't see how new generated code (with modified default handling) could work with an old library.
If we can't preserve this compatibility, we would probably want to increment the major version. It's been a while since 1.0, so this is probably okay. We should certainly consider whether there are other issues that might require API breakage that should be adopted along with this.
What I found is that current we do the check in codegen xx.pb.swift.
func traverse<V: SwiftProtobuf.Visitor>(visitor: inout V) throws {
if self.showID != 0 {
try visitor.visitSingularUInt32Field(value: self.showID, fieldNumber: 1)
}
try unknownFields.traverse(visitor: &visitor)
}
What if we add an option to JSONEncodingOptions, then
we change
mutating func visitSingularUInt32Field(value: UInt32, fieldNumber: Int) throws {
if value != 0 || options.includeDefaultParameters {
try startField(for: fieldNumber)
encoder.putUInt32(value: value)
}
}
Then API will remain same, and then we do check inside actual visitSingularUInt32Field or visitSingularUInt64Field or visitSingularStringField
This is a good idea. I'd need to think about how this works for proto2, though.
The visitor api shouldn't know about the JSONEncodingOptions, we'd need to make a some other (internal) "options" to expose what ever subset makes sense for all possible visitors.
I think JSONEncodingVisitor
currently owns JSONEncodingOptions
, and we are using JSONEncodingOptions.alwaysPrintEnumsAsInts
in the JSONEncodingVisitor
Implementation. If this is something we want to support across all visitors, you are right, it probably should some other internal options.
mutating func visitSingularEnumField<E: Enum>(value: E, fieldNumber: Int) throws {
try startField(for: fieldNumber)
if !options.alwaysPrintEnumsAsInts, let n = value.name {
encoder.appendQuoted(name: n)
} else {
encoder.putEnumInt(value: value.rawValue)
}
}
The call to JSONEncodingVisitor
is happening on the generic visitor interface and that call is blocked on the value not being the default, so all the JSON specific types are removed from where the change is needed.
The call to
JSONEncodingVisitor
is happening on the generic visitor interface and that call is blocked on the value not being the default, so all the JSON specific types are removed from where the change is needed.
Oh, I see. Yes, the original approach I proposed required all visitor to include default value check, which is not really flexible. Another way in my mind.
We add a function into Visitor protocol
protocol Visitor {
func shouldInlcludeDefault() ->Bool
....
}
We provide default implementations as other methods in Visitor protocol
extension Visitor {
func shouldInlcludeDefault() ->Bool {
return false
}
...
}
In JSONEncodingVisitor, we can implement it by checking shouldIncludeDefaultValue from options.
The last change will be codegen.
All traverse function will require to call
func traverse<V: SwiftProtobuf.Visitor>(visitor: inout V) throws {
if self.id != 0 || visitor.shouldIncludeDefaultValue() {
try visitor.visitSingularUInt32Field(value: self.showID, fieldNumber: 1)
}
}
I guess this can be also used by formatTextString.
@tbkka @thomasvl let me know what do you think.
@seanliu1's idea would have:
I'll have to think about whether this is enough compatibility to avoid the major version bump.
@tbkka see this comments. https://github.com/apple/swift-protobuf/issues/861#issuecomment-479680877
People wont notice any change, unless shouldIncludeDefaultValue is provided. So
Old generated code with new library: default values would always be omitted
New generated code with old library: default values would always be omitted
Likely should make a sweep of the options the other languages provide to TextFormat, JSON, and Binary to double checks if there are potentially any other things we might want to support that also would reach to this level of the Swift impl.
Performance wise, we could call this once within each generated visitor method, but it is still a lot of calls if the graph has a lot of sub messages, and the call can't be optimized since it is on a generic Visitor.
Let me know if I can help anything.
I managed to make some changes based on my initial idea.
https://github.com/seanliu1/swift-protobuf/pull/1
I also have the chance to look at
Python implementation https://github.com/protocolbuffers/protobuf/blob/master/python/google/protobuf/json_format.py#L247
Java implementation https://github.com/protocolbuffers/protobuf/blob/master/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L922
C++ Implementation
https://github.com/protocolbuffers/protobuf/blob/5b232b8ecbce13286be09e703997e887ae0d464d/src/google/protobuf/util/json_util.cc#L100
boolJsonPrintOptions::always_print_primitive_fields
https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.util.json_util
Python and Java all use descriptor to loop through all fields, and insert default value to the dictionary or map. Python version does not support output Singular message fields and oneof fields default value.
Taking a quick look, the visitor seems to have special cased zero/false as the default. That's only true for a message declared in a file with proto3 syntax. A message in proto2 syntax can have any default the .proto file author chooses, so I'm not sure your flow would work out correctly when a proto2 syntax message is used as a field of a proto3 syntax message.
Thanks for taking a look. I forgot proto2 can have any default value. I can think of three workarounds
mutating func visitSingularEnumField<E: Enum>(value: E, default E, fieldNumber: Int) throws
func traverse<V: SwiftProtobuf.Visitor>(visitor: inout V, includeDefault:Bool == false) throws
Only JsonEncoderVisitor will call
try traverse(visitor: &visitor, includeDefault: options.includeDefault)
Other callsites will remain same.
change private let options: JSONEncodingOptions
to internal let options: JSONEncodingOptions
in JSONEncodingVisitor
, traverse can directly access it.
As I mentioned before, add a new protocol to Visitor. func shouldIncludeDefault()->Bool
Default implementation is return false, JsonEncodingVisitor provides its own implementation.Here is a PR for it https://github.com/seanliu1/swift-protobuf/pull/2
I also looked at other options provided by Java, Python, C++
JSON
Looks like all languages are on the same page, Swift just does not have includeDefault
TextFormat
All support printUnknownFields
Python provides lots of options for formatting, for example as one line (no new line character), use angle brackets instead of curly braces for nesting See more But I do not think those changes need to touch this level, usually it can be added only for TextFormatVisitor
C++ and Java can do PrintFieldValueToString given field. C++ will also print the default value,
Binary Looks pretty similar across those languages.
Would love some feedback for this https://github.com/seanliu1/swift-protobuf/pull/2 Thanks
Left some quick comments, the repeated calls to the visitor also have the potential to slow things down when most fields are the defaults. It might make sense to fetch it once and reuse it in the conditionals (you could also check the flag first since when it is on, wether the value is set or not, you are calling the visit method).
But this approach goes back to @tbkka comments before, there are behavior issues with old generated code against the new runtime (and new generated code against the old). So the question of a version change still has to be sorted out. And if there is a version change, then making this a method on visitor vs. doing something more generic like passing options for it into the generated visited code might make things easier to add others in the future (although the same interaction problems would likely exist).
@thomasvl Thanks, I addressed your comment on repeated calls to the visitor, currently it will only call once at beginning of traverse. Since it involves API change and version management, I will leave the PR open until version bump is sorted out.
I think I might have figured out part of how to solve the versioning issue. We could generate two different traverse methods, one that provides the old behavior for use with the old libraries, and one that supports the new behavior. This might look like:
// Called by old libraries. Never visits default values.
func traverse<V: SwiftProtobuf.Visitor>(visitor: inout V) throws {
traverse(visitor: &visitor, visitDefault: false)
}
// Called by new libraries. The caller specifies whether default values are visited.
func traverse<V: SwiftProtobuf.Visitor>(visitor: inout V, visitDefault: Bool) throws {
if self.id != 0 || visitDefault {
try visitor.visitSingularUInt32Field(value: self.showID, fieldNumber: 1)
}
}
The expected behavior: If you're using an old library or old generated code, defaults are not visited. Defaults can only be controlled if you have a new library and new generated code. The above solves three of the four cases:
traverse(visitor:)
in new generated code. The result never visits default values, but that's the expected behavior when using the old library.traverse(visitor:visitDefault:)
in new generated code. That allows the new library code to select whether defaults are visited.The remaining case is for the new library used with old generated code. This requires a way for the new library to call the old traverse(visitor:)
method when the new traverse(visitor:visitDefault:)
method isn't available. Is there a way to solve this with generics trickery or maybe adding a new protocol?
Hm, so we'd have to generate a travers shim in ever message class, which on all new library uses is wasted code (and if declared public, won't dead strip).
For your last case, could the updated library provide an extension to Message with a default impl of the new traverse method that calls the old? Or would that always get used for any generic operation on Messages directly?
... shim in every message class, which on all new library uses is wasted code ...
At this moment, I'm happy if we can find any solution that works. I agree it would be nice to avoid this overhead. Certainly, that extra shim could be omitted the next time we change the major version.
... could the updated library provide an extension to Message with a default impl of the new traverse method that calls the old? Or would that always get used for any generic operation on Messages directly?
Good idea! I think that would do it. I think generic uses and protocol-typed variables will work correctly as long as we declare both traverse
methods on the Message
protocol.
So I think here's the full set of changes that would allow us to add this feature without a major version change. When we do change the major version someday, we can remove the compatibility shims at that time.
Caution: I haven't actually tested this, so there may be a problem with this approach that I've overlooked.
Change traverse
method signature to traverse(visitor:visitDefault:)
. This requires changing the generated code, adding the new method to Message.swift
, updating callers in the library, and adding the new option to JSONEncodingOptions
. Once this is done, new generated code with the new library should have the new capability.
Add shim to generated code so that new generated code can be called by old library. This just requires adding the following to the generated output:
// Called by old libraries. Never visits default values.
func traverse<V: SwiftProtobuf.Visitor>(visitor: inout V) throws {
try traverse(visitor: &visitor, visitDefault: false)
}
Add a Message extension to Message.swift
so that the new library can use old generated code. Make sure that the protocol Message
declares both versions of traverse
.
/// Shim to support old generated code.
///
/// If this library is used with old generated code that lacks the new `traverse(visitor:visitDefault:)` method, we want to just use the old `traverse` method. This will ignore the `visitDefault` parameter, but we cannot do better with old generated code.
extension Message {
func traverse<V: SwiftProtobuf.Visitor>(visitor: inout V, visitDefault: Bool) throws {
try traverse(visitor: visitor)
}
Couple random thoughts on this from this weekend -
oneof
s going to be handled? It doesn't make sense to visit all of the fields as you shouldn't serialize all of them (that's an error during parsing if I remember correctly). And setting a oneof
to the explicit default still captures it and serialize it, so this likely should be ignored by oneof
s?For oneof
: What does C++ do?
For proto2, I agree that visitDefault
should never have an effect. The easiest answer might be to just continue generating the old traverse
method for proto2.
Hello there! Any update on this issue?
Is there any update in this, it would be interesting to have this option the JSONEncodingOptions
We're currently working towards a 2.0 release, so now would be the time take one something like this since it could be done with a breaking change to the generated code.
That's great, thanks for the info. Do you have any estimated date when 2.0 will be released?
Is there any update on this issue?
No specific update on exactly when 2.0 will be; and to this specific issue, no one seems to have taken up making a PR for this feature request.
Any update on this for 2.0? Thanks
Same as last check, no one has proposed a PR for this.
Thanks @thomasvl
I've seen @seanliu1 achieved something in here: https://github.com/seanliu1/swift-protobuf/pull/2
Is there any documentation of how to do the setup to get default values in the JSON?
Thanks
I've seen @seanliu1 achieved something in here: seanliu1#2
Is there any documentation of how to do the setup to get default values in the JSON?
The default values would be on the FieldDescriptor
. I haven't looked at that change, but if the have something working, it likely would just need to be update to the current main
here, i.e. - it should be basically what is needed.
Are we okay with making breaking changes to the Message
type in a pr since we're working towards a 2.0 version? If so I can work on something that looks like traverse<V: SwiftProtobuf.Visitor>(visitor: inout V, options: [VisitorOptions])
Actually after playing with this a bit I think an options
argument to the traverse
method would fall into the same backwards compatibility pitfalls that we encountered in this thread. In order to future proof this I propose we define a type-erased VisitorNode
type
public struct VisitorNode {
private let _visit: (inout Visitor, VisitorOptions) throws -> Void
internal func visit(visitor: inout Visitor, options: VisitorOptions) throws {
try _visit(&visitor, options)
}
// Factories
public static func singularFloat(_ value: Float, fieldNumber: Int, defaultValue: Float) -> Self {
.init { visitor, options in
if options.visitDefaults || value != defaultValue {
try visitor.visitSingularFloatField(value: value, fieldNumber: fieldNumber)
}
}
}
public static func singularEnum<E: SwiftProtobuf.Enum>(value: E, fieldNumber: Int, defaultValue: E) -> Self {
.init { visitor, options in
if options.visitDefaults || value != defaultValue {
try visitor.visitSingularEnumField(value: value, fieldNumber: fieldNumber)
}
}
}
public static func singularMessage<M: Message & Equatable>(value: M, fieldNumber: Int, defaultValue: M) -> Self {
.init { visitor, options in
if options.visitDefaults || value != defaultValue {
try visitor.visitSingularMessageField(value: value, fieldNumber: fieldNumber)
}
}
}
// etc ...
}
And have the Message
protocol provide visitor nodes instead of providing the traverse
implementation.
extension SomeProto {
var visitorNodes: [VisitorNode] {
[
.singularEnum(value: someEnum, fieldNumber: 0 defaultValue: .unspecified),
.singularFloat(value: someFloat, fieldNumber: 1, defaultValue: 0),
.singularMessage(value: someMessage, fieldNumber: 2, defaultValue: .init())
]
}
}
This allows us to have a common shared traverse implementation
func traverse<V: Visitor>(inout visitor: V, options: VisitorOptions) throws {
for node in visitorNodes {
try node.visit(visitor: &visitor, options: options)
}
}
this allows the library to have more flexibility to change the internals of the visitor pattern regardless of which version of the library was used to generate the type conforming to Message
I'm still pretty new to this project so I'm not sure if there are potential pitfalls with this approach. I'm not sure if what I'm proposing would trigger issues like this one
Are we okay with making breaking changes to the
Message
type in a pr since we're working towards a 2.0 version? If so I can work on something that looks liketraverse<V: SwiftProtobuf.Visitor>(visitor: inout V, options: [VisitorOptions])
Yup, 2.0 is allowed breaking changes.
As to your other questions - I'll defer to the Apple folks on the general api changes. @tbkka @FranzBusch @Lukasa
I do worry a little about how much the compiler will be able to inline compared to the current code and it also seems like you concern about stack size might be justified.
Having said that, I do sorta like the idea of only having to generate something more Array like that might hopefully be less actual code, reducing the overall size of the generated code.
Sounds good I'll put together a PR this week and we can discuss more there, thanks for the quick reply!
func traverse<V: Visitor>(inout visitor: V, options: VisitorOptions) throws {
for node in visitorNodes {
try node.visit(visitor: &visitor, options: options)
}
}
I developed the current API after performance-testing a lot of different alternatives. But I must confess, I never tried something with quite this shape, so I'm very curious to see how well it would work. Like Thomas, I'd be willing to accept a modest performance loss if this could significantly reduce code size. Certainly worth the experiment!
I suspect the first performance challenge will be to ensure the array of nodes is statically allocated just once for each message type.
I suspect the first performance challenge will be to ensure the array of nodes is statically allocated just once for each message type.
I'll have to think about this a bit. On the surface level static allocation doesn't really make sense since each node is capturing the value of a given field. Maybe if nodes capture key paths this could work π€
I'd be willing to accept a modest performance loss if this could significantly reduce code size.
Do you have any recommendations for ways I can capture the relative performance of my solution? Are there some good sample protos I could use or some benchmark test cases I could run?
On the surface level static allocation doesn't really make sense since each node is capturing the value of a given field. Maybe if nodes capture key paths this could work π€
Oh. I missed that. Hmmm.... That means this array is getting built every time you try to walk the structure? That's going to be hard for binary serialization, in particular, since we actually walk the structure several times (to compute size information that's used to allocate the necessary buffers). And building this array will require a bunch of allocations: a couple for the array itself, and likely one more for each captured existential.
Key paths would be worth experimenting with, I think.
Do you have any recommendations for ways I can capture the relative performance of my solution? Are there some good sample protos I could use or some benchmark test cases I could run?
We have a Performance suite that Tony Allevato wrote and which I've relied on a lot in the past. It's a collection of shell scripts that can generate standard protos and then compiles and runs benchmarks for both C++ and Swift. Most importantly, it makes some very pretty graphs of the results. π Pretty handy, though the shell scripting for compiling the C++ benchmarks keeps breaking as the protobuf folks keep churning their build system.
(Note: A few years back, Thomas suggested we switch the binary encoder to serialize from back-to-front in the buffer; that would only require a single walk to size the entire output, then a walk to actually write the data into the buffer. I keep meaning to set aside time to try implementing this approach, which would also address a more serious performance cliff we currently have in the binary encoder.)
I played around with keypaths and I think they should work! I've got a rough prototype have been able to make this work for every field type.
Here is the general shape I've got so far
struct FieldNode<M: Message> {
private let fieldNumber: Int
private let subtype: Subtype
private enum Subtype {
case singularFloat(keypath: KeyPath<M, Float>, defaultValue: Float)
// ...
}
func traverse<V: Visitor>(message: M, using visitor: inout V, options: VisitorOptions) throws {
switch subtype {
case .singularFloat(let keypath, let defaultValue):
let value = message[keyPath: keypath]
if options.contains(.visitDefaults) || value != defaultValue {
try visitor.visitSingularFloatField(value: value, fieldNumber: fieldNumber)
}
// ...
}
}
// MARK: - Factories
public static func singularFloat(_ keyPath: KeyPath<M, Float>, fieldNumber: Int, defaultValue: Float) -> Self {
Self(fieldNumber: fieldNumber, subtype: .singularFloat(keypath: keyPath, defaultValue: defaultValue))
}
// ...
}
extension FooType: Message {
static let fieldNodes: [FieldNode<Self>] = [
.singularFloat(\.someFloat, fieldNumber: 1, defaultValue: 0)
]
}
Message/Enum/Map fields are a little trickier but I do have something working for those too.
Accessing the fields through a keypath doesn't seem to have a significant performance impact compared to accessing fields directly. I think swift switch statements over enums are O(1) but I need to confirm this as that could be a performance bottleneck.
If we push this concept a little further we could actually use this to get rid of _protobuf_nameMap
and potentially save even more space by consolidating everything into fieldNodes
by including the naming information in the field node with an API like this.
static let fieldNodes: [Int: FieldNode<Self>] = [
1: .singularFloat(\.someFloat, defaultValue: 0, name: .same("some_float"))
]
We'd probably need to use an ordered dictionary though since I assume its important that visitor visit fields in order and we wouldn't want to do any sorting at runtime.
This is very promising! We'd need to see some measurements (performance and code size) to see how to best take advantage of this, of course. For example, if it's a big code size win but also a significant performance regression, then we might need to give people the option of generating either old-style (fast) or new-style (compact) output. But we'd have to see some actual numbers to inform any such planning. Of course, we all hope thatπ€ it's a code size win without being a performance regression -- that would be truly wonderful!
This might also give us a way to drop the generated Equatable conformances, which is another source of code bloat; walking this list of field info would let us compare two messages with a general iterator instead of bespoke code.
You might not need a dictionary to replace _protobuf_nameMap
, actually. That name map is a dictionary today because the current traversal provides a field number that needs to get translated. If the encoding is being driven by an array of field nodes, then those nodes would give both field number and name, so you would not need a translation per se. And arrays are more compact and faster to traverse than dictionaries. Hmmm.... There is the question of decoding, though. Hmmm...
Yes exactly, you'd still need to the ability to do a lookup for decoding purposes from what I can tell.
Next step for me is to get this code out of my custom plugin I've written and into a fork of swift-protobuf
and get started with some performance testing. I'm not looking forward to having to re-generate all the pb.swift
files in this repo π
I'll report back my findings when I have some π, thank you for your feedback so far!
One catch for visit is the intermixing of extension ranges. Not sure if you want markers for that in the list of fields, or if we capture the info in another list and then walk the two in parallel to mix the data? (unknown fields just get put at the end, not in order).
As far are regeneration goes, once the generator is updated, the Makefile
has targets to do that.
Developing using iOS12, Swift 5, proto3 . I am about to add an extension which can support to output fields with their default values. I just want to check whether it is already implemented.
Based on proto doc, it looks like
I wonder does swift version has option to output fileds with their default values. I found python version has it
MessageToJson(message, including_default_value_fields=False)
https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.json_format-module