TheThingsNetwork / lorawan-stack

The Things Stack, an Open Source LoRaWAN Network Server
https://www.thethingsindustries.com/stack/
Apache License 2.0
975 stars 306 forks source link

Migrate away from gogo/protobuf #2798

Closed rvolosatovs closed 1 year ago

rvolosatovs commented 4 years ago

Summary

https://github.com/gogo/protobuf is not mantained any more https://github.com/gogo/protobuf/issues/691 (currently)

Why do we need this?

It's our dependency, which is incompatible with new golang/protobuf version, which more and more packages depend on, hence we need to replace the golang/protobuf version, depending on outdated versions of our direct dependencies and potentially even breaking packages this way

What is already there? What do you see now?

gogo/protobuf dependency

What is missing? What do you want to see?

Figure this out

How do you propose to implement this?

Figure out if a new maintainer will appear or different plugin with feature parity? Use just vanilla protobuf?

How do you propose to test this?

tests

Can you do this yourself and submit a Pull Request?

yes

adriansmares commented 2 years ago

Can we already branch off that particular branch, or should we wait until it is reviewed and merged ?

htdvisser commented 2 years ago

I did plan to fixup and rebase that branch when protoc-gen-go-flags is finished enough to make the CLI work, so let's be careful with that.

But if we coordinate who works on the branch on which day, then that should be okay too.

htdvisser commented 2 years ago

Here's an overview of the files that still need gogoproto.customtype removed:

34:    bytes join_eui = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
36:    bytes dev_eui = 2 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
70:  bytes target_net_id = 13 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
84:  bytes join_eui = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
88:  bytes join_eui = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
95:  bytes home_net_id = 2 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
96:  bytes home_ns_id = 3 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
186:    bytes gateway_eui = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
398:    bytes dev_addr = 5 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
399:    bytes net_id = 6 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
576:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID",
682:  bytes dev_addr = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
716:  bytes join_eui = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
717:  bytes dev_eui = 2 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
219:  bytes eui = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
44:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64",
52:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64",
60:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr",
74:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64",
144:  bytes net_id = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
32:    (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.AES128Key",
164:  bytes dev_addr = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
179:  bytes join_eui = 1 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
180:  bytes dev_eui = 2 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
181:  bytes dev_nonce = 3 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevNonce"];
203:  bytes net_id = 2 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
204:  bytes join_eui = 3 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
205:  bytes dev_eui = 4 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
211:  bytes join_nonce = 2 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.JoinNonce"];
212:  bytes net_id = 3 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID"];
213:  bytes dev_addr = 4 [(gogoproto.nullable) = false, (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
344:  bytes dev_addr = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
348:  bytes pending_dev_addr = 4 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
127:  bytes forwarder_net_id = 2 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID", (gogoproto.nullable) = false];
133:  bytes forwarder_gateway_eui = 9 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
137:  bytes home_network_net_id = 5 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.NetID", (gogoproto.nullable) = false];
36:  bytes dev_addr = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddr"];
55:    bytes eui = 2 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64"];
113:        (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddrPrefix",
130:        (gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64Prefix",
52:  repeated bytes dev_addr_prefixes = 12 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.DevAddrPrefix", (gogoproto.nullable) = false];
54:  repeated bytes join_eui_prefixes = 13 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64Prefix", (gogoproto.nullable) = false];
110:  bytes join_eui = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64", (gogoproto.nullable) = false];
111:  bytes dev_eui = 2 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64", (gogoproto.nullable) = false];
115:  bytes eui = 1 [(gogoproto.customtype) = "go.thethings.network/lorawan-stack/v3/pkg/types.EUI64", (gogoproto.nullable) = false];

Copied instructions from https://github.com/TheThingsNetwork/lorawan-stack/pull/5439:

  1. Preparation: make tools/bin/mage

  2. Find a field with gogoproto.customtype in our .proto files

  3. Remove the gogoproto.nullable option and replace the gogoproto.customtype option with:

(validate.rules).bytes = { len: 4, ignore_empty: true },
(thethings.json.field) = {
  marshaler_func: "go.thethings.network/lorawan-stack/v3/pkg/types.MarshalHEXBytes",
  unmarshaler_func: "go.thethings.network/lorawan-stack/v3/pkg/types.Unmarshal4Bytes"
}
  1. If the message has a (thethings.flags.message) option, also add flag options:
(thethings.flags.field) = {
  set_flag_new_func: "go.thethings.network/lorawan-stack/v3/cmd/ttn-lw-cli/customflags.New4BytesFlag",
  set_flag_getter_func: "go.thethings.network/lorawan-stack/v3/cmd/ttn-lw-cli/customflags.GetExactBytes"
}
  1. Use the expected length in:

    • len in the validate.rules option
    • Unmarshal4Bytes in the thethings.json.field option
    • New4BytesFlag in the thethings.flags.field option
  2. Run tools/bin/mage proto:all

  3. Go to the .pb.go file corresponding to the thing you're editing, and

    • remove custom written getter for the field
    • use gopls to find all references the field, and fix the code using the .Bytes() method, or using types.MustDevAddr(xxx), together with .OrZero() if it expects a value type instead of a pointer type. It's also possible that you need to remove some of the .Bytes() or types.MustDevAddr(xxx) that were added in previous commits.
  4. Use a separate commit for each field, otherwise it may get messy. In my experience it was also easier to make the changes in TheThingsIndustries/lorawan-stack first and then backport them, instead of the other way around.

KrishnaIyer commented 2 years ago

I'm picking up api/tti/* today.

htdvisser commented 2 years ago

And the other task I mentioned is https://github.com/TheThingsIndustries/lorawan-stack/issues/3187, so please also take a look there.

KrishnaIyer commented 2 years ago

Picking deviceclaimingserver.protoand gateway.proto.

KrishnaIyer commented 2 years ago

Note:

KrishnaIyer commented 2 years ago

Picking up end_device.proto, networkserver.proto and packetbrokeragent.proto

KrishnaIyer commented 2 years ago

Picking up metadata.proto and messages.proto

KrishnaIyer commented 2 years ago

Based on offline discussion

johanstokking commented 2 years ago

@KrishnaIyer is it just about removing this field?

@htdvisser can you list the things that need to be done to totally get rid of gogoproto?

KrishnaIyer commented 2 years ago

Yes it's only one field in keys.proto but it's usage is broad and would require modifications to adapt it.

htdvisser commented 2 years ago

I don't have a complete list of everything that needs to be done to totally get rid of gogoproto, but here are the next steps that I've determined so far in the issue/go-proto-v2 branch:

  1. Removing customtype options (see also @KrishnaIyer's comment)
  2. https://github.com/TheThingsIndustries/lorawan-stack/issues/3187
  3. Re-generate protos using plugins that support the GoProtoV2 API
  4. Refactor the errors package to use the GoProtoV2 API (https://github.com/TheThingsIndustries/lorawan-stack/commit/ffbaaed64051b99d0321019efe4d007a61ce0262)
  5. Refactor the API to use the GoProtoV2 timestamp and duration types (https://github.com/TheThingsIndustries/lorawan-stack/commit/c22fb0dd390b2f62b7ef8acdf3d8d26ad1f5ebfa)
  6. Replace GoGo WKTs with GoProtoV2 WKTs (https://github.com/TheThingsIndustries/lorawan-stack/commit/8fc7f1fd469c1085b94f0bda3757913b3298da47)
  7. Update protobuf and grpc-gateway imports (https://github.com/TheThingsIndustries/lorawan-stack/commit/7d054afb96621cb23f31e1e74362549178ec844c https://github.com/TheThingsIndustries/lorawan-stack/commit/87b0f20e5b2aa0ab20f0b7c63a95847a38786784 https://github.com/TheThingsIndustries/lorawan-stack/commit/510854c7e6c95152e7a643700f7d5bdf1343a9c6)

Once I reached that point, I was getting too many compiler errors to make any more progress, but I think that those will resolve once (1) and (2) are done, and we'll be able to continue.

johanstokking commented 2 years ago

Currently working on keys.proto and trying to manage the errors explosion ... turns out we use session keys and root keys quite a lot

adriansmares commented 2 years ago

Step 1 and 2 from the comment above are done (🎉).

I've started looking into updating protoc-gen-validate based on the comment above and some details have changed since the last update:

At this point in time I'm not in favor of creating a new protoc-gen-go-validate. I think we should 'reset' our fork to the latest release of protoc-gen-validate and check if we can cherry-pick our changes (the main ones are https://github.com/envoyproxy/protoc-gen-validate/commit/c97df826497e7191af77a2a1c9bc6e7ef6ea9448, https://github.com/envoyproxy/protoc-gen-validate/commit/4b98317d923aa2d85325861cfb2209d43878779e and https://github.com/envoyproxy/protoc-gen-validate/commit/3549aa748183a15cffe5f4d0303ab0ef3b39d34f). This should suffice (🤞) in order for us to move forward. I'll take a look tomorrow into how feasible this is.

adriansmares commented 2 years ago

I've built two new releases of protoc-gen-validate and protoc-gen-fieldmask that are based on the V2 proto API. They are based on the new release of protoc-gen-star that enables our generators to generate V2 API code.

I've updated the tooling in a separate branch to use the new protoc / protoc-gen-go tooling and remove gogo. You can see the diff here. In our generated code (paths, setters, validate) the changes are minimal to none.

Please sample some files and let me know what you think. If things look good, we can start implementing the commits from https://github.com/TheThingsNetwork/lorawan-stack/issues/2798#issuecomment-1190055867 on that branch, and finally move to API v2.

adriansmares commented 1 year ago

Migration status update:

In order to speedup the development, we can already start backporting some changes from https://github.com/TheThingsIndustries/lorawan-stack/pull/3445 in order to lower the diff:

adriansmares commented 1 year ago

Migration status update:

After the Christmas holidays we can start merging back this mammoth change in enterprise and open source.

The only problem to settle is backwards compatibility with respect to the JSON API. The context is as follows:

  1. For a large part of v3's lifetime, field masks were rendered as objects, not as strings. This behavior was caused by the gogoproto implementation of jsonpb. In this context, object form means {"paths": ["a.b.c", "c.d.e"]}, and string form means "a.b.c,c.d.e".
  2. When we started using protoc-gen-go-json one year ago, we accidentally broke the API and caused some field mask to render as strings. Messages which contained a field mask and an EndDevice ended up having a custom JSON marshaler, which in turn caused field masks to be rendered as strings. Messages which did not, such as ApplicationWebhook, ended up having field masks rendered as objects.

@ysmilda has implemented (1) which allows us to generate the marshallers for every message, thus ensuring that at least we use the same style everywhere, and (2) allows us to choose which style we want to use.

In https://github.com/TheThingsIndustries/lorawan-stack/pull/3445 I actually 'broke back' the API to render all field masks as objects, in order to keep consistence over time. My argument here is that for most of v3's lifetime and for all of the code in the wild, the object form is what is used.

Are there any objections to 'breaking the API' a second time to keep things consistent to what they were a year ago ? Our JSON is not JSONPb anyway (custom renderers, custom enum marshaling). @johanstokking @KrishnaIyer


Unmarshalling fieldmasks is not a problem - protoc-gen-go-json automatically can unmarshal both styles at the same time.

johanstokking commented 1 year ago

Are there any objections to 'breaking the API' a second time to keep things consistent to what they were a year ago ? Our JSON is not JSONPb anyway (custom renderers, custom enum marshaling). @johanstokking @KrishnaIyer

Yes I prefer consistency, even if we need to break the JSON API.