Closed dsnet closed 4 years ago
Interesting proposal. I faced a similar challenge when designing the public API for a dynamic message implementation.
If this goes anywhere, I'll be curious to see how similar they are in terms of shape/surface area. And I'll be excited to see how it can simplify that dynamic message's implementation (particularly around extensions, which I think is the weakest part of the current generated code).
About adding methods to proto Message interface, we have done exactly that a few times in both C++ and Java. For example, we added a ByteSizeLong method to C++ MessageLite interface not very long ago: https://github.com/google/protobuf/blob/master/src/google/protobuf/message_lite.h#L339
It's a pure virtual method so anyone who implements their own MessageLite subclasses will be broken.
Protobuf team's stance on this is that nobody except protobuf itself should subclass/implement these message interfaces. It's called out explicitly in our compatibility notice in Java: https://github.com/google/protobuf/tree/master/java#compatibility-notice
Protobuf message interfaces/classes are designed to be subclassed by protobuf generated code only. Do not subclass these message interfaces/classes yourself. We may add new methods to the message interfaces/classes which will break your own subclasses.
Go doesn't have the concept of a "default method", so this is going to be a breaking change unfortunately. The transition to get the world to the new API will be a little tricky and will probably have to occur in several phases.
There's an open proposal for default methods golang/go#23185, but it's not looking promising as the concept doesn't fit well into Go where interface satisfaction is implicit rather than explicit.
Here are some documents for the proposed plans to improve Go protobufs. I'd appreciate any feedback from the community regarding the design and migration plan:
\cc @neild @jhump @awalterschulze @tamird @meling
Couldn't the same approach to the database/sql package be taken ? Where various functions run specific assertions on smaller interfaces, rather than having a giant V1/V2 interface where both have all the functions defined in one big bundle.
Edit: This already seems the case in the new proposal, apologies for skimming over it too quickly
Both documents are really ambitious, but in principal sound like a great idea. Good work.
I am a bit concerned about the amount of work required to make this move on the part of gogoprotobuf. I will need some help. The last large move (currently on the dev branch) was a big job, taking ALOT of my personal time, which I am trying to focus om my studies. I am not working for a company that uses protobufs. But I want to stress that I do think these changes are necessary. So I will do my best to support these efforts.
Comments on protoreflect. It looks good, but I am a bit concerned, that the following cases might not have been taken into account. So maybe here are some tests for the protoreflect API:
[]MyMessage
, without a pointer field.time.Time
or *time.Time
or []time.Time
or []*time.Time
I think getting this protoreflect library right is essential to having only a single proto
library, which is the goal I am trying my best to support.
Thank you @awalterschulze for your feedback.
Both documents are really ambitious
I admit that the designs are indeed ambitious. Ideally, it would be nice if something like it occurred 8 years ago when Go protobufs were first implemented. However, that was not the case and we are struggling with the consequences today. As much work as this transition is, it will lay the foundation for a brighter future. Another 8 years from now, I hope we look back and see this moment as the point when Go protobuf support became much better.
I will need some help.
I'll help 👋.
Regarding your protoreflect
concerns, I was consciously thinking about how gogo/protobuf fit into it all as I was designing the reflection API.
How do we handle slices of message values (not pointers)?
This case is actually the reason why protoreflect.KnownFields.Set
has this documented restriction:
It is unspecified whether Set performs a pointer copy or copy of the pointed at value for composite types (e.g., Messages, Vectors, Maps).
Thus, for []T
where T is a value type, the reflection implementation only need to copy the pointed-at value.
The Mutable
methods are also implementable since every element in a Go slice is addressable, so the reflection implementation can obtain an address for an element and return it.
How do we handle message values (not pointers)?
This situation is different from above since proto semantics do not indicate the difference between a null or empty message within a repeated list of messages. However, for a standalone message, proto semantics do preserve whether a message is null or empty (even in proto3).
Message values occurs when the nullable
option is used. In this situation, proto semantics are being violated. Hence the documented warning that:
Warning about nullable: according to the Protocol Buffer specification, you should be able to tell whether a field is set or unset. With the option nullable=false this feature is lost, since your non-nullable fields will always be set.
An implementation of the reflection API will need to do a best effort at providing the illusion that the implementation is proto compliant.
However, this abstraction can leak:
protoreflect.KnownFields.List
on new instance of a message will return a non-empty list since some non-nullable sub-messages are always considered set.How do we handle custom types for bytes?
Similar to the documented restriction on protoreflect.KnownFields.Set
, I'll update the document such that it is unspecified whether protoreflect.ValueOf([]byte(...))
or protoreflect.Value.Bytes
results in a slice that is aliased or copied.
How do we handle Timestamp
or Duration
proto represented as a Go Time
or Duration
?
An implementation of the reflection API will need to create internal wrappers over time.Time
and time.Duration
that presents the illusion that they are just messages with a seconds
or nanos
field.
One possible implementation is here: https://play.golang.org/p/IXvjCK_Y_Hc
However, these wrappers are leaky abstractions:
Timestamp.SetNanos
with nsec outside the range of [0, 999999999] changes the value of Seconds
and returns a Nanos
that does not the same value as the original nsec.
However, note that the documentation on Timestamp.nanos
says that nanos must always be
within the range of [0, 999999999], but this is a documentation restriction,
not semantic restriction.Duration.SetNanos
with nsec outside the range of [-999999999, +999999999] changes the value of Seconds
and returns a Nanos
that does not the same value as the original nsec.
However, note that the documentation on Duration.nanos
says that nanos must always be
within the range of [-999999999, +999999999], but this is a documentation restriction,
not semantic restriction.Duration.SetSeconds
with a sec value that has a different sign than Nanos
,
results in Nanos
being negated and
calling Duration.SetNanos
with a nsec value that has a different sign than Seconds
,
results in Seconds
being negated.
In practice, this is not an issue because SetSeconds
and SetNanos
are almost always
set together (with the correct signs) without an intermediate get on Seconds
or Nanos
.google.protobuf.Duration
type is documented as being able to represent ±10000 years,
which is beyond the range Go's time.Duration type
, which can only represent ±290 years.
Thus, calling SetSeconds
with a sec value beyond Go's limit results in a silent truncation.Duration
has a similar non-nullable issues as non-nullable messages unless a *time.Duration
was used. Note that time.Time
does not have this issue since the zero value time.Time
can represent being null. However, when the Mutable
method is called to retrieve a time.Time
, the timestamp must be initialized as Unix(0, 0)
, which is the zero value of the proto Timestamp
message.In practice, I don't expect the abstraction leakages mentioned above to be much of a problem in practice. If anything the overflow of time.Duration
may be the most likely candidate, and the wrapper could help alleviate that by setting the time.Duration
to time.Duration(math.MaxInt64)
to help signal that failure scenario.
I agree on the necessity and if you are willing to make contributions to the gogoprotobuf repo to bring the two implementations closer, to the point of eliminating the need for two proto
libraries, then let's do this :)
On customtype
, this was introduced first and the more sustainable extension casttype
was only introduced later. customtype
only assumes that you have implemented some methods, like Marshal, Unmarshal, Size, etc. It does not necessarily have to be an alias, as in the case of casttype
. I hope we can still support this?
If we can hide the timestamp/duration wrappers from the user, then its fine, but I would hate for them to have to import something outside of the standard library to have a time.Time field. It is very important that the generated fields are the ones users want.
And yes, nullable=false
and time.Duration
are subtle leaky abstractions, but again it is what users want. If they were going to write a function to copy between the structs they want to use and the generated ones, then they were going to run into these subtletees anyway. gogoprotobuf prefers to handle these in a single sensible way over having users write error prone and slow copy functions.
Although these comments sound negative, these are really the only problems I can see. I think we can get there and I appreciate all your efforts here, including taking gogoprotobuf into account.
For anyone subscribed. Work is actively being done to address this issue. See the "api-v2" branch on this GitHub repo.
The in-development new package includes a definition of proto.Message
which satisfies this feature request:
type Message interface{
ProtoReflect() protoreflect.Message
}
The new Message
type provides a behaviorally complete view of a message. Functions such as serialization (wire, text, and json), comparison, cloning, and so forth may all be implemented entirely in terms of this interface without any assumptions about the underlying type.
That's the good part. The downside is that we now have two types describing a message: The original Message
and the new one. Existing packages which include the proto.Message
type in exported functions cannot be changed to use the new type without a breaking API change.
We have two options at this point:
Message
type than the old one, generated types implement both, and users will need to pick between the two. We can provide convenience functions that convert between the two types.Message
type to be identical to the old one. We will provide a function which takes a proto.Message
and returns a protoreflect.Message
.The first option gives us the best new API at the expense of making it more difficult for users to upgrade. The second option is an inferior API, but greatly simplifies the upgrade story.
Inside Google, the use of proto.Message
in public APIs is pervasive. However, we have the ability to refactor the Google-internal codebase; the more interesting question is how common proto.Message
is in external public APIs. If it is little-used, then the cleaner API becomes more appealing. If it is common, then the most compatible one does.
Either way, we're getting to the point where we need to make a decision. Thoughts from non-Google users would be particularly useful here.
@neild, consider this a vote for the first case, as it allows users to upgrade their dependencies incrementally (meaning 3rd party library dependencies that also depend on the proto runtime). If all of the newly generated messages are still compatible with the old packages/ABI, then they need only re-generate all of their protos to be able to take advantage of the new package's features.
But I wonder: if these convenience functions mentioned are given an old generated message (e.g. does not have any add'l methods to be easily convertible to the new interface), how would that be handled? Would there need to be some non-trivial translation layer? (I know this could be made to work, by interpreting the message's descriptor bytes and generating the reflection structures at runtime from that. But I'm unsure if that's planned or not.)
Would there need to be some non-trivial translation layer?
This non-trivial translation layer already exists since we must support the situation where a newly generated message has a dependency on a old generated message that does not support the v2 API.
@neild - May I suggest of making 2 new posts or a 2 new issues for the sole purpose of collecting 👍 and 👎 ?
This way (without searching github for references to proto.Message) you can get a feeling of what people are interested in.
I personally would vote for a cleaner interface, but I also foresee that it might be required to implement both approaches anyway, since sometimes we have to "link" with existing code that is out of our control.
@jhump by "the first case", you mean protoV1.Message
and protoV2.Message
are not the same type?
Let me lay out the options (as I see them) in a bit more detail. (Edit: Doing this as two separate posts as suggested by tandr@ to collect thumbs up/down on each.)
Message
typesThis is the current state of the new package. The old package has:
type Message interface {
Reset()
String() string
ProtoMessage()
}
And the new package has:
type Message interface{
ProtoReflect() protoreflect.Message
}
Generated message structs satisfy both interfaces (for now, at least): They implement all four required methods. You can pass a newly generated message to either package.
Older generated messages don't have a ProtoReflect
method, so you need to pass them to an adapter function of some form. This will look something like:
import (
"google.golang.org/protobuf/proto" // New package
protoV1 "github.com/golang/protobuf/proto" // Old package
)
// This function works with a concrete message that hasn't been regenerated to
// support the new Message interface.
func A() {
// ...
m := &somelegacypb.Message{}
proto.Unmarshal(b, proto.FromV1(m))
}
// This function accepts the old Message type.
func B(m protoV1.Message) {
// Get a reflective view of this V1 message.
p := proto.FromV1(m).ProtoReflect()
// ...
}
The FromV1
function will look something like:
func FromV1(m protoV1.Message) Message {
if m2, ok := m.(Message); ok {
// Underlying type implements the new Message interface.
return m2
}
return nonTrivialTranslationLayerStuff(m)
}
Pros:
proto.Message
interface is complete over the required behavior.Cons:
Message
types during the transitional period (which could be long).proto.Message
in their exported API.Message
typeWe change the definition of Message
in the new package to be identical to the old one.
type Message interface {
Reset()
String() string
ProtoMessage()
}
This removes the need for conversion functions as used above; an existing type which implements proto.Message
can be passed directly to the new package.
You now need to use a function to get a reflective view of a message:
func B(m proto.Message) {
// Get a reflective view of this message.
p := proto.Reflect(m)
// ...
}
The proto.Reflect
function will look something like:
func Reflect(m proto.Message) protoreflect.Message {
if p, ok := m.(interface {
ProtoReflect() protoreflect.Message
}); ok {
return p.ProtoReflect()
}
return nonTrivialTranslationLayerStuff(m)
}
Pros:
Cons:
Message
type is not behaviorally complete and will never become so.Reset
and ProtoMessage
methods, leading to additional binary bloat.proto.Message
, do not implement the reflection APIs, and are not understood by the non-trivial translation layer.@neild, @dsnet, if option 1 wins (two separate packages/APIs), does the existing api-v2
branch -- in its current form -- represent an almost-stable API for the new stuff? I've only been following along with development occasionally and haven't really tried using it yet. So I was curious to start porting a repo to the new stuff, to actually play with the new APIs and such. But I'm wondering, if I choose to tackle that now, how likely is the v2 API to change materially out from under me?
We've been migrating a number of targets inside Google to use protobuf reflection and through the experience, it has helped us refine what the API should be. CL/175458 is an example of API changes informed by real usage. Within Google, we have the ability to also migrate all users since we use a monorepo. Unfortunately, we can't fix external users.
But I'm wondering, if I choose to tackle that now, how likely is the v2 API to change materially out from under me?
Unfortunately, likely. Fortunately, most of the changes are fairly mechanical changes. If it helps, we can track all breaking changes in one place with instructions on how to change.
Representing an external company with heavy protobuf usage, I'd prefer to pay the one-time conversion cost to gain a superior interface for the long haul. As we (mostly) also use a monorepo, I don't foresee the conversion work required to be very hard.
I am mostly exposed to gRPC and open source use of protobuf and in my experience the use of the proto.Message
interface is rare. I favour option 1.
I prefer option 2, fairly strongly.
Lack of a gradual upgrade path is too painful. That's the sort of thing that slows teams way down and prevents any upgrade from occurring at all.
Gradual upgrades are especially important for the github.com/golang/protobuf project to care about because protoc-gen-go
couples generated code to a snapshot of the runtime library. Widespread use of RPC systems built on generated code means that this coupling spans multiple teams: you're importing a generated client of one service, and providing a client to other services. Non-gradual upgrades force lockstep updates across an organization.
Statements like "when use of the old package dies out, we have a cleaner and healthier ecosystem" concern me. Usage of the old package will never completely die out. We'll always have references to the old package.
@spenczar, I think there may be a misunderstanding of what option 1 and 2 entail.
The lack of gradual upgrade path is for APIs that use proto.Message
in their exported API. Uses of protobuf should be fine with option 1.
The real "lack of gradual upgrade path" means that packages that currently use proto.Message
in their export API must do one of two things to support the new reflection functionality:
Message
interface instead of the old one. (And, of course, possibly reimplement some of that API to properly use the new interface/take advantage of new features.)Message
interface (in addition to maintaining old entry points that use old interface).But old code can still link against and work just fine w/ new protobuf runtime and old generated code will still work.
@neild, did I describe that accurately?
While one should always have a way to smoothly transition from one way to another, I think such a transition has to have a definite limit in terms of support lifetime, otherwise, you end up supporting the old way forever, even when it makes your life miserable because everything has to work both ways now, and that becomes Just The Way Things Are.
Some people will never change until it breaks, no matter how trivial the change might be. And breaking things is not Always The Wrong Choice. :woman_shrugging:
I’ll point here to https://github.com/grpc/grpc-go/issues/711 which was a somewhat similar situation, where a choice in code generation would break people. It sat in “cannot change or we break people”, up to “we have a migration strategy” to “once go1.8 is end-of-life”. It took two and a half years to make what was on the surface a relatively simple change.
But this is a problem that is never going to be seen inside of Google, because protobufs are compiled fresh every build. The entire notion of checking in generated code is to me still kind of crazy. It’s pretty much just like checking in a binary blob.
@jhump Hm, I'm not following. Here's how I understood things:
Under option 1, the (github.com/golang/protobuf/proto).Message
interface is changed. That means old generated code no longer implements the (github.com/golang/protobuf/proto).Message
interface.
Is that correct? I might have this wrong, particularly if this is an interface in a new github.com/golang/protobuf/proto/v2
package.
If old generated code no longer implements the proto.Message
interface, then the struct values can't be passed into any APIs that accept proto.Message
, so either the code needs to be regenerated, or those APIs need to change, or the code needs to be linked against an old version of github.com/golang/protobuf/proto. Is that correct?
If this is the case, I think we hit the lockstep problem.
@puellanivis, if that was in reply to my last comment, I wasn't suggesting the old APIs be supported forever. But they must be supported during some sort of transition period. For the second option I mentioned (having a package that provides API for both old and new interface), the idea is that the functions for the old interface would eventually be deprecated/removed.
As far as checking in generated code: not everyone has blaze :) It is idiomatic Go that one be able to go get ...
to download and install a Go package or program, and the go
tool does not attempt any other build steps (such as go generate ...
). So checking in the generated code gives users the best experience. Otherwise, go get ...
will fail to compile and users must then take extra steps to run make
or go generate ...
or whatever before retrying go install ...
.
@spenczar:
I might have this wrong, particularly if this is an interface in a new
github.com/golang/protobuf/proto/v2
package.
Yes, it is a new interface. I think the suggestion is that the import path for the v2 package will be "google.golang.org/protobuf/proto"
. But, in the in-development api-v2
branch, it's similar to as you wrote: "github.com/golang/protobuf/v2/proto"
.
@jhump Thanks. I agree, then, that my worry about lockstep upgrades does not apply; option 1 looks better to me too.
@jhump I’m well aware of the difficulties that lead to people checking in the generated code. Such is the world we live in. But I had completely composed my message before your comment came in and it was more of a generalized reply to @spenczar and expressing my preference against Option 2.
@jhump
@neild, did I describe that accurately?
Yes, that's correct. Let's say today you have a package with an exported API like this:
package prototwiddle
import "github.com/golang/protobuf/proto"
// Twiddle fiddles with a message.
func Twiddle(m proto.Message) {}
If we redefine Message
in the new package, then you can switch to the new proto API without changing your exported API with something like:
import (
protoV1 "github.com/golang/protobuf/proto"
"google.golang.org/protobuf/proto"
)
func Twiddle(m protoV1.Message) {
twiddleV2(proto.FromV1(m))
}
func twiddleV2(m proto.Message) {
// ...
}
You could make twiddleV2
an exported function to directly support either definition of Message
, but you can't change the signature Twiddle
without making a breaking API change (and presumably releasing a new major version).
@spenczar
I might have this wrong, particularly if this is an interface in a new github.com/golang/protobuf/proto/v2 package.
Yes, the question is what the interface in the new package should be. (The final package name is going to be google.golang.org/protobuf/proto
, FWIW.) The definition of "github.com/golang/protobuf/proto".Message
can't change.
For any adventurous people who are actually using v2, I've mentioned above that the API is not fully stable yet. If you want to be notified of any breaking changes, subscribe to #867.
The google.golang.org/protobuf
module has been released where it has a new definition of the Message
interface that treats protobuf reflection as a first-class feature.
Congratulations on an excellent piece of work.
Filing on behalf of @alandonovan.
We can't add methods to
proto.Message
without breaking backwards compatibility. One approach we can take is to defineproto.MessageV2
that is a much more semantically complete interface that provides a form of "protobuf reflection". InMarshal
,Unmarshal
,Merge
,Clone
, and so on, we can type assert if it implementsproto.MessageV2
and use that interface to implement generic versions of those functions. Ifproto.Message
doesn't satisfyproto.MessageV2
, thenMerge
can just fail (it already does on most third-party implementations ofproto.Message
).