apple / swift-protobuf

Plugin and runtime library for using protobuf with Swift
Apache License 2.0
4.57k stars 451 forks source link

Pitch: Strong typedefs on fields #619

Open zwaldowski opened 7 years ago

zwaldowski commented 7 years ago

I have a protobuf service endpoint backed by several legacy services (that I am not in control of) that use "stringly-typed" enums as fields. Once transpiled into Swift, these proto fields are misleading and type-unsafe, since they appear as just String and unduly encourage doing String things with them. Furthermore, we need to do something meaningful with these "unknown" cases, so using standard proto enums untenable. While it would be ideal to change the service to format types as messages instead, I'm really only trying to implement gain a quality-of-life improvement for the Swift side.

Recent versions of Swift support annotating Objective-C typedefs as "newtypes" or "open enums":

typedef NSString *MyStringEnum NS_EXTENSIBLE_STRING_ENUM;

extern NSString *const MyStringEnumFoo;

These get bridged in as single-field structs:

struct MyStringEnum: RawRepresentable {
    let rawValue: String
    init(_ rawValue: String) {
        self.rawValue = rawValue
    }

    static var myCase: MyStringEnum { get }
}

This encourages using a namespace for declaring constants, and, indeed, even without Objective-C involved this pattern is reasonably common, such as with OptionSet.

I propose adding a custom option to fields:

extend google.protobuf.FieldOptions {
  optional bool swift_newtype = xxxxx [default=false];
}

message MyMessage {
    string specialValue = 1 [swift_newtype=true];
}

That is then used to generate a wrapper type:

extension MyMessage {
    struct Field: SwiftProtobuf.<##NotYetUnderstoodProtocol##>, _CustomJSONCodable {
        var value: String
    }
}

The type would implement some subset of Message to encode and decode its field without nesting, as this type is not a message by proto standard. The JSON value would also decode and encode as its single value without nesting, saving space in the payload.

In practice, this would be equivalent to creating a named duplicate of the StringValue et. al. WKTs.

Future Direction

Alternatives Considered

tbkka commented 7 years ago

This is certainly worth discussing. A lot of folks seem to have this exact problem with the other end working with strings instead of enums, and it would be nice to have a richer/more robust solution.

I suspect we can simplify your idea a lot. Being able to encode/decode the field value does not really require implementing a subset of Message. It just requires the field type be able to convert itself to/from a supported value. (Which is basically what RawRepresentable does.)

For the interim, I'll point out that you can achieve some of this today simply by extending the generated struct. Starting from a simple string-typed field:

message MyMessage {
  string specialValue = 1;
}

You can then extend the generated type with converting accessors:

extension MyMessage {
   var specialValueAsEnum: EnumType {
      get { ... read string, return Enum ... }
      set { ... translate enum to string, write string ... }
   }
}
zwaldowski commented 7 years ago

I suspect we can simplify your idea a lot. Being able to encode/decode the field value does not really require implementing a subset of Message. It just requires the field type be able to convert itself to/from a supported value. (Which is basically what RawRepresentable does.)

Yeah, I wasn't familiar enough with Message to say - but that was the exact subset I meant, in the same way we have overloads generic with E: Enum.

Re: converting accessors, as my team is in a phase of incremental migration, preventing you from accessing the variable the "wrong way" is pretty important. But it is a good point.

thomasvl commented 7 years ago

Do you have an example of how you initially do the markup for these string-enums in the first place?

And how do you handle this for other languages? I can't say I've seen something that I think is immediately like this, so it would be interesting to bring it to the main protobuf group also since it could implact other languages. Google older JSON apis (apiary), also used strings for the enum values, so there was some overlap in the problem space there, but for gRPC, I believe they mostly used the binary or the newer proto3 syntax JSOM encoding which does provided a spec.

I'll also add, the biggest downside with anything string based is always dealing with unknown values. While servers are easy to update to add new values, client code tends to be some what under user control for updates, so having to support a string based enum can be extra difficult there.

zwaldowski commented 7 years ago

Do you have an example of how you initially do the markup for these string-enums in the first place?

Sorry, could you rephrase that?

And how do you handle this for other languages? I can't say I've seen something that I think is immediately like this… <##snip##>

Haskell calls this newtype, which I'm sort of stealing the name of. Any language with zero-cost abstractions (Rust, Swift, C++, etc) could support the idea of a pseudo-message struct at a similar fidelity. Duck-typed languages are unlikely to gain much benefit from it, so I wouldn't recommend consuming the attribute there.

I'll also add, the biggest downside with anything string based is always dealing with unknown values.

I'm extremely aware of this limitation and wish I controlled all the services in my stack to eradicate this need. 😃

It is, however, why my proposal here backs off a little on the literal "string enum", even though it's the most easily centering example. Mirroring your "[servers/someone else] can easily add new values" situation, I see fairly often because these "enums" are not actually enumerated, i.e., a closed set. My client-side still, though, wants the safety of something they can switch over the values they know about without evolving both client and server in lock-step.

thomasvl commented 7 years ago

Do you have an example of how you initially do the markup for these string-enums in the first place?

Sorry, could you rephrase that?

Was wondering if you have an example .proto file that you are using now to declare this since I don't know any way to do it currently.

Or do you just use a .proto file declaring the field as a string, and do something external of that for the specific "values" the field can be?

zwaldowski commented 7 years ago

Ah, OK. Gotcha. Yes, we currently I define them elsewhere. I haven't thought through a way I'd like to define that yet.