apple / swift-protobuf

Plugin and runtime library for using protobuf with Swift
Apache License 2.0
4.55k stars 443 forks source link

Act on Messages generically ("Reflection" support in some other languages) #1029

Open gmilos opened 4 years ago

gmilos commented 4 years ago

swift-protobuf doesn't support descriptors currently (https://github.com/apple/swift-protobuf/blob/9df381f72ff22062080d434e9c2f68e71ee44298/Protos/google/protobuf/descriptor.proto). Specifically, it's not possible to query the descriptors for file/message/... at runtime. This prevents number of reflection use cases, in particular support for gRPC service reflection https://github.com/grpc/grpc-swift/issues/768 (which is a valuable tool for interrogating gRPC services without prior schema knowledge).

Support for descriptor would require extra binary space in order to keep metadata required to build such a descriptor. C++ currently handles this by definining two different message types: Message vs MessageLite (https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.message#Message.GetDescriptor, https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.message_lite#MessageLite) where MessageLite doesn't have the embedded descriptor. This is controlled at gen time, with option optimize_for = LITE_RUNTIME;.

The extra binary space issue is already something we have to deal with for JSON/text serialisation. See https://github.com/apple/swift-protobuf/issues/18 for discussions of an approach where we could conditionally include the functionality and conformance to a protocol.

tbkka commented 4 years ago

If the API here returned a decoded descriptor, then we could iterate over time with different ways of storing the data. For example, we could store it in TextFormat or a base64-encoded binary or ...

thomasvl commented 4 years ago

As @tbkka mentioned, descriptors aren't all that interesting. What you need is what C++ (and other languages) add beyond that to then generically work on message. i.e. - given a descriptor parse data/query fields/serialize. Also, given generic Message, have the library use the descriptor data to then also act on it. Both of these sorta access are some times referred to in the other libraries as "reflection" based access.

The first case, could be done with just a Descriptor because the API could use some specific structure as an impl details. The second is much more difficult as it requires being able to bridge a descriptor that also includes information about how to access the fields of a real instance.

As far as generating lite/full, there is movement with Google's protobuf team to move somewhat away from that sorta generation mode, as it causes problems - what is one dependency just needs lite, but something else needs full and they overlap in the messages they need this for, how does a build system generate things only once and correctly. As you mention the text data for Swift, we've debated moving this into separate files, so it builds to model it as additional modules that are optionally depended on to resolve the build issues. ps - the text data would likely go away if we had descriptors since the data would be in the descriptor.

Anyway, changing the summary, since the issue isn't really exposing the Descriptor (that's easy, but as mentioned, you can't do anything in Swift with it); it is more about providing the reflection type apis to act on messages generically.

tbkka commented 4 years ago

If we had some means to get the embedded descriptors (even as just an "experimental" option of some sort), I think it would be a lot easier to get people to help build out the APIs to use them.

thomasvl commented 4 years ago

If we had some means to get the embedded descriptors (even as just an "experimental" option of some sort), I think it would be a lot easier to get people to help build out the APIs to use them.

Not sure I follow. As you mention, it isn't clear what data is useful. If we don't intend to build api that uses that format, what's the point of exposing it?

protoc already has an option to make a DescriptorSet out of the proto files passed to it, so someone could make a binary with that, and then use Google_Protobuf_FileDescriptorSet since we export it to parse it. But sorta like on the plugin, without helper to link it all together, it isn't that useful, which goes back to the point about API to work with it.

gmilos commented 4 years ago

The concrete use case for me is gRPC reflection (i.e. the following issue: https://github.com/grpc/grpc-swift/issues/768). gRPC reflection allows the service API to be interrogated and invocations to be made via a CLI (https://github.com/grpc/grpc/blob/master/doc/command_line_tool.md).

FileDescriptorResponse is used to return descriptor of the types used in the defined service: https://github.com/grpc/grpc-proto/blob/dd2dca318eb197b96b60f22297871fb1ed862800/grpc/reflection/v1/reflection.proto#L110-L115.

In order for the Swift gRPC codegen to offer reflection support, there has to be a standarised support for getting descriptors of protobuf types. While it's theoretically possible to lift the feature to gRPC, we're likely eventually going to bring descriptor support to swift-protobuf, so I'd much prefer this option to start with. That way other use cases such as constructing self describing messages (https://developers.google.com/protocol-buffers/docs/techniques#self-description) would also benefit.

I'd argue it makes sense to decouple the ability to query descriptors from message reflection (a complex area of its own, since no standard approach to generic reflection exists in the language, yet).

thomasvl commented 4 years ago

From the looks of https://developers.google.com/protocol-buffers/docs/techniques#self-description, they'd actually want the transitive set, which is a lot more than just have a message be able to return the descriptor for itself. This would require logic to go from a messages descriptor, up to the file's, and also collect all the transitive descriptors used as imports to provide all the support types.

All that seems to point more in the direction of the grpc generation feature at the moment. Scoping the support where needed would also mean if there was a bug, it could be fix via one project instead of synchronized changes between two projects.

tbkka commented 4 years ago

@gmilos -- Does gRPC require DynamicMessage support (the ability to encode/decode/store/manipulate) messages without having generated code? Although it's not trivial, I don't think this is especially hard to implement. As a starting point, I would suggest trying to represent a DynamicMessage as:

Getting the MessageDescriptor will be interesting. Depending on the use case, someone might have a collection of MessageDescriptor objects already available, or they might have a collection of FileDescriptors (in which you can look up the MessageDescriptor) or they might have a FileDescriptorSet, or any combination of these.

thomasvl commented 4 years ago

I thought @gmilos case was that they want grpc-swift to be able to return descriptors so the other tool could then query them and act on the data? i.e. - no reflection/generic message support, just a way to return descriptors from within grpc-swift. That's why I was thinking since it will take support within that library, it makes more sense to it to do all the work itself so even if SwiftProtobuf had the descriptors (and/or reflection support) it likely wouldn't directly match what was needed for what they are trying to support.

gmilos commented 4 years ago

grpc-swift would only need to return the descriptor information, specifically https://github.com/grpc/grpc-proto/blob/dd2dca318eb197b96b60f22297871fb1ed862800/grpc/reflection/v1/reflection.proto#L94 in response to either one of the following requests https://github.com/grpc/grpc-proto/blob/dd2dca318eb197b96b60f22297871fb1ed862800/grpc/reflection/v1/reflection.proto#L45-L55.

@thomasvl, I see your point about possible API discrepancy, but if SwiftProtobuf had the ability to return descriptors, we'd try our hardest not to have to duplicate the message type descriptor metadata (since grpc-swift is generally oblivious about serialisation, in extreme case it might not even be protobuf). Of course we don't yet have descriptors in SwiftProtobuf so that's mostly moot point.

thomasvl commented 4 years ago

grpc-swift would only need to return the descriptor information, specifically https://github.com/grpc/grpc-proto/blob/dd2dca318eb197b96b60f22297871fb1ed862800/grpc/reflection/v1/reflection.proto#L94

The FileDescriptorResponse message appears to be a fully serialized FileDescriptorSet - https://github.com/grpc/grpc-proto/blob/dd2dca318eb197b96b60f22297871fb1ed862800/grpc/reflection/v1/reflection.proto#L110-L115

If we expose Message -> DescriptorProto, it wouldn't help. You'd need the FileDescriptorProto. If we expose Message ->FileDescriptorProto, that still doesn't get your the ordered list that is aFileDescriptorSet. Since the Runtime code has no need for apis for this, you'd be talking about building all this into the generator code:

  1. Collect the FileDescriptorProto
  2. Loop walks the imports
    1. Look up the file
    2. Collect the FileDescriptorProto
    3. Recurse to ensure all transitive imports are visited/collected (just once)

So if we're talking all this at generation time, that's where it seems like grpc-swift could do exactly this right now, it already generates off the exact same data that SwiftProtobuf does (FileDescriptorSet/FileDescriptor). So why put the indirection in place now when SwiftProtobuf would be doing this specifically to the needs of grpc-swift?

Yes, if/when SwiftProtobuf has some of the reflection apis, we might be able to expose the same information, but it feels odd to build all this out for the single use case of grpc-swift when grpc-swift generation is on the exact same apis that SwiftProtobuf would be trying to generate this data from.

tbkka commented 4 years ago

Possibly relevant: SE-0271 adds support for binary resources in Swift packages. Writing out the FileDescriptorSet as a separate binary resource would make it easy for people to selectively include/omit that data as needed.

I also wonder if it would make sense to spin out the Descriptor support APIs as a separate library that people could incorporate if they had a use for those capabilities.

thomasvl commented 4 years ago

I think we'll eventually do it, but even the Descriptor classes we have work fine for code generators, but without a way to bridge that to parsing/acting on messages/etc. I'm still struggling for what one would do with them. So use cases are likely important before we commit to that API in a larger space.

jtbandes commented 1 year ago

I'm interested in this too. Is there any current best practice or known workaround for generating a Descriptor or FileDescriptorSet from a type conforming to Message at runtime?