apple / swift-protobuf

Plugin and runtime library for using protobuf with Swift
Apache License 2.0
4.57k stars 451 forks source link

Enum.init(protoName:) #326

Open NachoSoto opened 7 years ago

NachoSoto commented 7 years ago

I was relying on this constructor, and it's now gone (see https://github.com/apple/swift-protobuf/pull/320/files#r103062915). Was there a reason for that? Would it be possible to bring it back?

Thanks!

allevato commented 7 years ago

Can you describe in more detail why you to be able to translate strings to enum cases outside of the already provided text-format/JSON encoding/decoding processes?

NachoSoto commented 7 years ago

Yup. In my particular example, I have this enum:

enum NotificationCategory {
    PUSH_NOTIFICATION_CATEGORY_1 = 0;
    PUSH_NOTIFICATION_CATEGORY_2 = 1;
   // ... etc
}

Then, from my backend, I'm sending notifications to Apple's APNS with that category ("PUSH_NOTIFICATION_CATEGORY_1", etc). In the client I can then easily get the category from the push notification body, and deserialize it that way.

I could of course send the rawValues instead, and change my app .plist to numbers, but that would obscure the meaning of these categories, so it would be a lot better if I could continue sending these raw Strings.

allevato commented 7 years ago

My point of view is going to come as that of an API purist. 😄

So, in our efforts to slim down the public API to what makes sense for Swift users, we reduced the enum API to only the cases and their raw values. This also serves another purpose—as a longer term goal we've been hoping to refactor the way generated fields/enum values expose their names so that we could provide an option to remove those strings completely from the code for users who only needed binary encoding/decoding (#18). There are big wins there for the mobile use cases with respect to binary size.

The way that other languages would grant you access to this metadata is through descriptors (more or less reflection for protos). For example, the Objective-C type that's generated for a proto is just a standard C enum, so the name mapping isn't available there; the runtime provides GPBEnumDescriptor instead which has the API for that. But we intentionally chose not to emit descriptors because of the bloat they would add to clients†.

So... all this philosophical stuff because of enum strings. 😄 My thinking is, if we want to support this, the right way to do it would be to emit those descriptors—perhaps as a command-line option so users have to opt-in to the bloat—and people who need reflective capabilities can use those. This isn't on the 1.0 milestone track, and we haven't discussed it too much—we certainly haven't designed it for Swift yet.

@thomasvl, you and I have had a lot of conversations about descriptors and you have more experience in this area. What are your thoughts?

(† Of course, it would be interesting to see how good swiftc and the linker do here. If we generated descriptors internal and you only referenced one of them, would the rest get dead-stripped in the final executable?)

thomasvl commented 7 years ago

First up, Descriptors in the C++ api, and are just the proto objects, they actually are something built out of the proto data so they can be more of a graph of objects to show relationships. The ObjC were along those lines. This means they aren't cheap, they have the data cost as well as a runtime hit to build them up when access.

Descriptors also have a major down side to some customers, the bundle everything into the binary needed to know how to decodes the Apps wire formats. Our visitor pattern likely comes close to including the same data, but it isn't as easy to data mine as waking static data of a binary to find descriptor sets set.

To the original question, I'm still missing something. So you've got a push notification in Apple's format, and you want to transform it into a proto message? I'm assuming there is more than just an single enum value here. So it sounds like you likely have to do custom code anyways, so while the enum api might have been convent, it sounds like it was more chance than a real complete solution. So if custom code is already there, doing something to manage your name mapping doesn't seem like that big of an issue. Certainly comparing to potentially adding a public api to every enum to bloats the code for all users?

tbkka commented 7 years ago

This means they [descriptors] aren't cheap, they have the data cost as well as a runtime hit to build them up when access.

I've heard some complaints about the run-time data size of using descriptors, especially from iOS developers who have to deal with sometimes severe limits on data space (iOS can can demand-page code, but does not swap data space, so data space is actually much more valuable than code).

@NachoSoto: We have pieces of what you need internally -- the JSON and Text encoding/decoding engines do need this text information. The tricky part is working out the right API. In particular, we're trying to ensure the right behavior if the Enum does not support ProtoNameProviding.

NachoSoto commented 7 years ago

@NachoSoto: We have pieces of what you need internally -- the JSON and Text encoding/decoding engines do need this text information

Right, that's what my thinking was: it shouldn't add any extra size, right? IMO this is just a matter of picking the right API, like you said. I do understand your comments about discouraging using this data, and it's really not a big deal for me to switch this use case to using Ints instead. But I imagine I won't be the only one that will need this in Swift, since it is available in other languages.

NachoSoto commented 7 years ago

Any update on this?

thomasvl commented 7 years ago

When you say it is available in other languages, can you point to examples? For C++, I think you've have to use full on TextFormat and can't do it on a per enum case without going to the Descriptor. Exposing something as complete as the Descriptor is what we are really holding off from for a 1.0 release.

allevato commented 7 years ago

While I can appreciate your need for something like this to improve the legibility of some other parts of your code base where you refer to enums outside of a protobuf encoding context, I'm still inclined to hold my initial reasoning here:

But, as @thomasvl mentioned, we're holding off on descriptors for 1.0 because they add bloat that many people don't need (and they're an all-or-nothing proposition), and I would very much like to not scatter bits and pieces of descriptor-like functionality as public API throughout the library to satisfy individual use cases that pop up. If we want to support these transformations, we should do it the right way.

In the meantime, I would recommend providing your own string-to-enum transformation or using the numeric values.

tbkka commented 7 years ago

@NachoSoto: If I understand correctly, you're basically asking for a readable text serialization mechanism so you can put textual enum values into some other serialization mechanism (plist).

You might consider using the protobuf "Text Format" serialization for this instead of just the bare enum name. If you have a message with a single field containing the enum, like this:

enum NotificationCategory {
    PUSH_NOTIFICATION_CATEGORY_1 = 0;
    PUSH_NOTIFICATION_CATEGORY_2 = 1;
   // ... etc
}

message Notification {
   NotificationCategory category = 1;
}

Then if you construct a Notification message and serialize it using textFormatString() you will get this:

category: PUSH_NOTIFICATION_CATEGORY_1

You could then store this entire string (including the field name) in your push notification and deserialize it at the other end using init(textFormatString:). The format used by textFormatString() is really intended for debugging use, but it is a stable, robust, and performant format that has been used for many other purposes.

As a bonus, this would allow you to expand the Notification message here in the future with additional information as necessary.

NachoSoto commented 7 years ago

Thanks for the responses!

It's not just this particular feature, it's also a bunch of places in our web that need to continue supporting these string enum values for backwards compatibility, because we haven't migrated JS to use Protobuf, and we won't any time soon.

When you say it is available in other languages, can you point to examples?

For example in Java Enum#valueOf, or the fact that the JSON serialization uses this format, so being able to deserialize it allows maintaining compatibility with other systems that are using the same data representation but not necessarily Protobuf.

In the meantime, I would recommend providing your own string-to-enum transformation or using the numeric values.

I'll do that in the meantime 👍

But, as @thomasvl mentioned, we're holding off on descriptors for 1.0 because they add bloat that many people don't need.

Any idea what the timeframe for that would be after 1.0?

NachoSoto commented 7 years ago

In case anyone is curious, this is the approach I went with for now:

ProtobufEnum

import SwiftProtobuf

// See https://github.com/apple/swift-protobuf/issues/326 for why this is necessary
public protocol ProtobufEnum: SwiftProtobuf.Enum {
    init?(identifier: String)

    var identifier: String { get }

    static var allValues: Set<Self> { get }
}

Test coverage using Quick

import Quick
import Nimble
import SwiftProtobuf

class ProtobufEnumBehavior<T: ProtobufEnum>: Behavior<Set<T>> {
     override class func spec(_ allValues: @escaping () -> Set<T>) {
        for value in allValues() {
            it("Serializes identifier for \(value)") {
                expect(T(identifier: value.identifier)) == value
            }
        }
    }
}