Open allevato opened 8 years ago
If I understand correctly, you're suggesting that we first implement Issue #9, and then:
It's an interesting idea. I'm curious what the real savings are in code size.
Looking more closely, I'm surprised to see that those number/name maps are computed instance properties instead of static properties. That could certainly be a source of performance issues, especially if it's re-creating those arrays every time the property is accessed.
My general thinking is, there's never a scenario where you need the union of all the information in the same pass, which is what #9 will solve. You're either serializing JSON or text, but never both simultaneously. And when you're not doing JSON or text serialization, you don't need the names at all—in other words, the core ProtobufMessage
protocol doesn't have to contain any of that information at all, because the extensions can add conformances for specific visitations that provide that information instead.
So, I could imagine something like a ProtobufJSONSerializable
protocol that declares static properties for the number→name and name→number mappings. An extension in a separate file would add conformance to that protocol and define static let
properties with the specific mapping for that message. Then, to wrap it all up we have a JSON visitor that, through some generics, is constrained to message types M: ProtobufMessage where M: ProtobufJSONSerializable
, meaning that it can internally access M
's JSON mappings directly instead of passing things around. In other words, we're binding the context needed by the visitor statically at compile-time instead of passing it around at runtime.
I'll have to dig into the code some more to see if all the pieces fit together, but something like this should both speed things and make it possible for users to slice out only what they need.
This would affect the API between the generated code and the library, so I've labelled it accordingly. I'm looking at it now as part of rethinking the structure of the core Message protocols.
You're either serializing JSON or text, but never both simultaneously.
Yes, but you may need both sets of names simultaneously: JSON deserialization accepts both proto and JSON names; Java has an option to the JSON serializer to force it to use proto names instead of JSON names.
Here's an interesting issue that arises when you try to omit JSON support:
// File foo.proto
message Foo {
Bar bar = 1
}
// File bar.proto
message Bar {
....
}
If we generate Swift for both of these, then omit the JSON support for Bar
, what happens when someone tries to JSON serialize (or deserialize) Foo
?
In particular, there are two ways to implement this feature:
Bar
, but no key would ever be found, so every field would be skipped and ignored. This could be confusing -- people would get an empty Bar()
object where they might expect a populated one.JSONMessage
conformance; the protocol would define all JSON support). This might make it easier for the compiler to strip all the JSON code, but Foo
would have to use run-time type checks to see if Bar
had JSON support before trying to decode it. Depending on the check, this could fail the entire decode or just omit the bar
field entirely.I'm tempted by the idea of having a JSON decode fail if any sub-object didn't support JSON decoding.
My thinking was that this kind of situation would fail to compile—it could be considered an error to try to use protos that don't include JSON serialization support with those that do. This would imply that anyone producing a framework/shared component that includes user-surfaced protos would have to support JSON/text serialization, of course.
However, I'm not seeing a way to make that work cleanly: essentially, Swift generics aren't powerful enough yet to express what I laid out above, and we'd have to rely on runtime checks like you described.
Given those limitations, I think our best option would be to just have JSON encode/decode throw an error if any of the submessages don't support it. I'll start tinkering and see what I can come up with.
To follow up: We should do this (pull into an extension) for the ProtobufDebugDescription
logic as well, not just JSON and text. We can wrap that extension in #if DEBUG
instead putting it in a separate file so that it can be easily removed from release builds.
The benefit here is that if a user doesn't need text or JSON support and only uses debugDescription
for debug builds, then their release builds will be completely free of field names, which prevents data models from leaking in addition to the space saving benefits.
For ProtobufDebugDescription
, we can do the same kind of runtime check and just print field numbers instead of names if the conversion is requested in a release build.
We're not trying for this for the 1.0 Release, but we're keeping it open as something to hopefully consider in the future to make a final decision on.
I think all of the groundwork is in place now: The text encoders and decoders all test for ProtoNameProviding
now, so it should DTRT if that conformance is not present in some future version of the generator.
Might want to look at this as part of a 2.0 release, so bumping it up.
I guess we might also need to decided if this is a generator option or always done.
JSON/text support is currently provided directly in the same generated message as binary support. In some use cases, a client may not need text/JSON support, so it simply adds bloat:
It should be possible to slice JSON/text serialization support out of the standard generated message and move them into extensions, perhaps even in separate files (e.g., foo.pb.swift, foo.pb+json.swift, foo.pb+text.swift; names bike-sheddable).
This would let clients decide which support they need, without adding complexity to the generator with extra switches or options. A user who only needs binary support just takes the .pb.swift files; someone who needs JSON support would also include the .pb+json.swift files, and so forth.
Even in an application that only uses binary encoding for storage/transmission, text format can be useful for debugging. Factoring it out into separate files has the benefit of letting users only include it in their debug/testing builds, and leave production builds slim.
Since the amount of generated code to support these additional formats is not insignificant, this would prevent users from shipping unnecessary bloat (dead stripping can be unpredictable).