golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.66k stars 1.58k forks source link

Allow protojson to decode legacy maps into messages with shortcut map notation #1511

Open advdv opened 1 year ago

advdv commented 1 year ago

Is your feature request related to a problem? Please describe. We're dealing with a service architecture that describes maps in different ways in different service (for legacy reasons). Some old services emit messages with the "legacy" structure for encoding maps inside of protobuf messages. Other services use modern protobuf definitions with the "shortcut" notation for maps (see: https://developers.google.com/protocol-buffers/docs/proto3#maps).

As mentioned in the documentation the map<T,T> notation is sugar for what is in effect a list of entry messages as it is encoded on the wire. This is good, and works for us so we don't have to care about which service uses what notation. But this breaks when sent it in the canonical json format.

To explain this further, let's consider the following proto file (in reality, for our use case this would be in two different files but the explanation stays the same)

syntax = "proto3";
package examples.v1;

// ShortcutMap message use the map shortcut for our map field
message ShortcutMap {
    map<string, string> map_field = 1;
}

// LegacyMap uses the legacy setup to describe a map
message LegacyMap {
    // We repeat entries to make a map
    message MapFieldEntry {
        string key = 1;
        string value = 2;
    }
    repeated MapFieldEntry map_field = 1;
}

Now let's consider the scenarios that one service is sending the legacy LegacyMap message to a service that is only aware of the new ShortcutMap definition. Both in JSON and in the proto wire format:

package main

import (
    "fmt"

    examplesv1 "github.com/crewlinker/some-internal-protobuf-go-dir/v1"
    "google.golang.org/protobuf/encoding/protojson"
    "google.golang.org/protobuf/proto"
)

func main() {

    // send messages from our legacy code
    send1 := &examplesv1.LegacyMap{MapField: []*examplesv1.LegacyMap_MapFieldEntry{{Key: "foo", Value: "bar"}}}
    json1, _ := protojson.Marshal(send1)
    fmt.Printf("legacy json: %s\n", json1)
    proto1, _ := proto.Marshal(send1)
    fmt.Printf("legacy wire: %x\n", proto1)

    // receive messages in our new code
    var rcv1, rcv2 examplesv1.ShortcutMap
    proto.Unmarshal(proto1, &rcv1)
    fmt.Println("receive 1:", rcv1.MapField) // prints the map as expected
    protojson.Unmarshal(json1, &rcv2) // proto: syntax error (line 1:13): unexpected token [
    fmt.Println("receive 2:", rcv2.MapField) // prints an empty map (because it couldn't parse)
} 

The inconsistency is that the legacy notation for maps can be decoded transparently into a message with the shortcut notation for the protowire format. But not when the json format is used.

Describe the solution you'd like I would like it to be possible that the protojson unmarshal logic allows for decoding maps from the legacy format into a message with the shortcut notation. For my use case it would be ok if this is an option in the https://pkg.go.dev/google.golang.org/protobuf/encoding/protojson#UnmarshalOptions struct but I do believe it can also be implemented transparently (without breaking any backwards compatibility)

Describe alternatives you've considered Some alternatives:

Additional context To hopefully give this improvement a bit more weight I just want to emphasise:

puellanivis commented 1 year ago

This seems to be a request relevant to the protobuf standard itself. This module has been bitten before by going beyond the standards as they are defined, and thus is shy to implement something unilaterally that is not already standardized.

Namely: even if we make this change, will C++/Java/python support the same features? Having the Go implementation alone able to parse [{"key":"foo","value":"bar"}] into a map<string, string> means that the Go implementation will be unexpectedly different from other implementations.

advdv commented 1 year ago

Right, that makes sense. But it do think it is fair to raise this with the standard itself (or at least the canonical json encoding)! What would be the right place to create an issue for that?

dsnet commented 1 year ago

As @puellanivis mentioned, the Go implementation will only adopt this if the wider protobuf project deems this as standard behavior.

The issue tracker for the wider protobuf project is https://github.com/protocolbuffers/protobuf/issues.

advdv commented 1 year ago

@dsnet I understand. I've opened the following issue over here: https://github.com/protocolbuffers/protobuf/issues/11505