Look at how to factor out JSON/text support

allevato commented 8 years ago

JSON/text support is currently provided directly in the same generated message as binary support. In some use cases, a client may not need text/JSON support, so it simply adds bloat:

A user wants to use protobuf to represent their application's data model on disk and only needs binary coding.
A user is speaking directly to a service that communicates in binary PB format on the wire for transmission efficiency.

It should be possible to slice JSON/text serialization support out of the standard generated message and move them into extensions, perhaps even in separate files (e.g., foo.pb.swift, foo.pb+json.swift, foo.pb+text.swift; names bike-sheddable).

This would let clients decide which support they need, without adding complexity to the generator with extra switches or options. A user who only needs binary support just takes the .pb.swift files; someone who needs JSON support would also include the .pb+json.swift files, and so forth.

Even in an application that only uses binary encoding for storage/transmission, text format can be useful for debugging. Factoring it out into separate files has the benefit of letting users only include it in their debug/testing builds, and leave production builds slim.

Since the amount of generated code to support these additional formats is not insignificant, this would prevent users from shipping unnecessary bloat (dead stripping can be unpredictable).

tbkka commented 8 years ago

If I understand correctly, you're suggesting that we first implement Issue #9, and then:

Have the library provide default (empty) fieldNumber → name and name → fieldNumber maps
Generate extensions in separate files that override those maps with real information

It's an interesting idea. I'm curious what the real savings are in code size.

allevato commented 8 years ago

Looking more closely, I'm surprised to see that those number/name maps are computed instance properties instead of static properties. That could certainly be a source of performance issues, especially if it's re-creating those arrays every time the property is accessed.

My general thinking is, there's never a scenario where you need the union of all the information in the same pass, which is what #9 will solve. You're either serializing JSON or text, but never both simultaneously. And when you're not doing JSON or text serialization, you don't need the names at all—in other words, the core ProtobufMessage protocol doesn't have to contain any of that information at all, because the extensions can add conformances for specific visitations that provide that information instead.

So, I could imagine something like a ProtobufJSONSerializable protocol that declares static properties for the number→name and name→number mappings. An extension in a separate file would add conformance to that protocol and define static let properties with the specific mapping for that message. Then, to wrap it all up we have a JSON visitor that, through some generics, is constrained to message types M: ProtobufMessage where M: ProtobufJSONSerializable, meaning that it can internally access M's JSON mappings directly instead of passing things around. In other words, we're binding the context needed by the visitor statically at compile-time instead of passing it around at runtime.

I'll have to dig into the code some more to see if all the pieces fit together, but something like this should both speed things and make it possible for users to slice out only what they need.

tbkka commented 7 years ago

This would affect the API between the generated code and the library, so I've labelled it accordingly. I'm looking at it now as part of rethinking the structure of the core Message protocols.

tbkka commented 7 years ago

You're either serializing JSON or text, but never both simultaneously.

Yes, but you may need both sets of names simultaneously: JSON deserialization accepts both proto and JSON names; Java has an option to the JSON serializer to force it to use proto names instead of JSON names.

tbkka commented 7 years ago

Here's an interesting issue that arises when you try to omit JSON support:

// File foo.proto
message Foo {
     Bar bar = 1
}
// File bar.proto
message Bar {
     ....
}

If we generate Swift for both of these, then omit the JSON support for Bar, what happens when someone tries to JSON serialize (or deserialize) Foo?

In particular, there are two ways to implement this feature:

One way is to just separate the field name maps, leaving a default empty map. With that approach, you could still try to JSON deserialize a Bar, but no key would ever be found, so every field would be skipped and ignored. This could be confusing -- people would get an empty Bar() object where they might expect a populated one.
The other way would omit all JSON support methods (the extension would add a JSONMessage conformance; the protocol would define all JSON support). This might make it easier for the compiler to strip all the JSON code, but Foo would have to use run-time type checks to see if Bar had JSON support before trying to decode it. Depending on the check, this could fail the entire decode or just omit the bar field entirely.

I'm tempted by the idea of having a JSON decode fail if any sub-object didn't support JSON decoding.

allevato commented 7 years ago

My thinking was that this kind of situation would fail to compile—it could be considered an error to try to use protos that don't include JSON serialization support with those that do. This would imply that anyone producing a framework/shared component that includes user-surfaced protos would have to support JSON/text serialization, of course.

However, I'm not seeing a way to make that work cleanly: essentially, Swift generics aren't powerful enough yet to express what I laid out above, and we'd have to rely on runtime checks like you described.

Given those limitations, I think our best option would be to just have JSON encode/decode throw an error if any of the submessages don't support it. I'll start tinkering and see what I can come up with.

allevato commented 7 years ago

To follow up: We should do this (pull into an extension) for the ProtobufDebugDescription logic as well, not just JSON and text. We can wrap that extension in #if DEBUG instead putting it in a separate file so that it can be easily removed from release builds.

The benefit here is that if a user doesn't need text or JSON support and only uses debugDescription for debug builds, then their release builds will be completely free of field names, which prevents data models from leaking in addition to the space saving benefits.

For ProtobufDebugDescription, we can do the same kind of runtime check and just print field numbers instead of names if the conversion is requested in a release build.

thomasvl commented 7 years ago

We're not trying for this for the 1.0 Release, but we're keeping it open as something to hopefully consider in the future to make a final decision on.

tbkka commented 7 years ago

I think all of the groundwork is in place now: The text encoders and decoders all test for ProtoNameProviding now, so it should DTRT if that conformance is not present in some future version of the generator.

thomasvl commented 2 years ago

Might want to look at this as part of a 2.0 release, so bumping it up.

I guess we might also need to decided if this is a generator option or always done.

apple / swift-protobuf

Look at how to factor out JSON/text support #18