containerd / ttrpc

GRPC for low-memory environments
Apache License 2.0
536 stars 78 forks source link

Drop dependencies not compatible with gogo/protobuf #67

Closed fgiudici closed 2 years ago

fgiudici commented 4 years ago

This si meant to address issue https://github.com/containerd/ttrpc/issues/62. The idea is to do something similar to what was done in PR https://github.com/containerd/ttrpc/pull/54 (but then reverted). Here anyway we don't copy the grpc/codes, which do not depend on protobuffer generated code. We just import googleapis/rpc/status from gogo/googleapis and copy the google grpc/status code to just force the usage of the struct Status from gogo/googleapis/rpc/status.

The code that relies on golang protobuf generated code is very limited: it is just the Status struct, used to report errors. Why not use the Status struct generated by gogo protobuf from the same proto file than?

AkihiroSuda commented 4 years ago

CI failing: The command "../project/script/validate/fileheader ../project/" exited with 1.

thaJeztah commented 4 years ago

@crosbymichael @stevvooe @AkihiroSuda ptal

estesp commented 4 years ago

Someone needs to answer whether this is no longer true--the same change was explicitly reverted for this declared reason:

This isn't going to work as expected because the errdefs package that is used throughout shims and others switches on the grpc codes packages.

  • from @crosbymichael in #57
crosbymichael commented 4 years ago

Thanks for the change. i'm refreshing my knowledge on that the problem and and what needs to be done. looking at it today

crosbymichael commented 4 years ago

building is one thing to resolve the issue, but do we know if v2 protobuf is backwards compat(wire format) with what we have today? I wouldn't want to make any changes that breaks existing shims or force external users to have to recompile the world.

fgiudici commented 4 years ago

Hi Michael,

building is one thing to resolve the issue, but do we know if v2 protobuf is backwards compat(wire format) with what we have today? I wouldn't want to make any changes that breaks existing shims or force external users to have to recompile the world.

Yeah, I was worried about this too, at least at the beginning. But then I felt quite comfortable pushing this PR for two main reasons: 1) Looking into the code the panic happens in gogo/protobuf while parsing the fields of the internal structure to fill them with the raw bytes from the received message (see gogo/protobuf/proto/table_unmarshal.go#296). There, the t of t.Field(i) is of type reflect.Type (lines # 278 and # 68) and is used to parse the internal fields names found in the Struct coming down from the ttrpc client code (as we can see in the trace of issue #62). Being that the "new" Status structure from googleapis (which we import in ttrpc) it finds an unexpected field (the status one) as mentioned in https://github.com/containerd/ttrpc/issues/62#issuecomment-686768932. What I think is wrong is that we use gogo/protobuf APIs passing down a struct (the Status struct) which comes from googleapis compiled with the (newer) google protobuf. Wouldn't be more safe to use gogo APIs with the Status struct produced by compiling the googleapis protocol with the gogo protobuf ? Note that there is no other code generated via protobuf in ttrpc AFAICS. We import just the google rpc error codes: just an enum of error codes and a binding to their strings. If tomorrow google adds extra error codes there, we will use them. But maybe at this point we could import the list of errors, as the implementation compatibility is broken.

2) What google changed is the go protocol buffer implementation (https://github.com/golang/protobuf). AFAICS, the buffer protocol specification (and so wire format among the others) has not changed. There are multiple bindings/implementations for many languages, changing the wire format would be... a bit criminal IMHO, isn't it? ;-)

One minor note: the code we import can be reduced further. I just wanted an exact copy of the status.proto generated code, but there are a couple of functions that we can drop if needed.

q3k commented 3 years ago

We are currently investigating using this PR's patchset within containerd/runsc to be able to build against google.golang.org/protobuf.

This PR introduces a regression, as gRPC statuses set by services using the unforked status library would get swallowed and downcasted into internal statuses with code UNKNOWN. This is because the gRPC status library code that has been forked does interface checks expecting the forked type, thereby never catching the unforked types.

In addition, the client Call() needs to be changed to return the original status type instead of the forked type, so that clients using status.FromError from the unforked status library can retrieve the status type correctly.

Our fix is on https://github.com/monogon-dev/ttrpc/tree/allow-original-status-codes . However, I feel that porting ttrpc over to Google's protobuf instead of gogo proto is a much better tix than working around these quirks.

q3k commented 3 years ago

(these issues seem to also be the reason why #54 was reverted in #57)

tonistiigi commented 3 years ago

@fgiudici What's the status of this? How can we get this moving or are changes needed based on @q3k comments? Assuming the wire format remains compatible I don't see major issues with getting this moving.

tonistiigi commented 3 years ago

Actually, I think I misunderstood the intent of this PR. As gogo is not maintained and has compatibility issues I'm in favor of removing it completely from all containerd repos so that containerd packages can be imported by other projects and everyone is not stuck with 2 year old proto library.

However, I feel that porting ttrpc over to Google's protobuf instead of gogo proto is a much better tix than working around these quirks.

👍

fgiudici commented 3 years ago

For what is worth, I also agree that relying on a no more maintained project (gogo protobuf) will for sure cause pain sooner or later, also if we fix the current issue. The problem is that current Google protobuf implementation is no more compatible with the gogo one. Ouch. So, moving to Google protobuf really looks the safest and most correct approach.

We have to consider BTW that the gogo protobuf was chosen over the Google one to have something much more lightweight to be used in the ttrpc (and containerd shimv2). Probably the switch to the Google protobuf should also account the impact on the resources and performances on the ttrpc/shimv2. I don't expect this to affect the overall usage of the end projects using ttrpc honestly (and maybe in the end could even be not noticeable at all) but considering/measuring the impact of the switch is probably part of the switch task.

One word about this patch: the only incompatibility I was able to find is "just" related to error reporting: in ttrpc we import that Status structure from the Google's protobuf project and use that around with gogo protobuf (for details see here). The patch here just imports the implementation of that Status structure generated from the gogo protobuf from the same source proto file used by the Google protobuf implementation. This would not be anyway a future proof path. Not at all. Gogo protobuf is no more maintained and is no more compatible with the Google protobuf implementation. If we don't deal with this now, we will have to anyway sooner or later (my gut say sooner... ;-) ).

kzys commented 2 years ago

We would drop gogo/protobuf instead of dropping the packages that are incompatible with gogo/protobuf.