golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.67k stars 1.58k forks source link

proto,encoding/protojson,encoding/prototext: support operations with fields omitted by FieldMask #1183

Open nnutter opened 3 years ago

nnutter commented 3 years ago

Is your feature request related to a problem? Please describe.

We use EmitUnpopulated (f.k.a EmitDefaults) to emit defaults such as rank: 0 in JSON format but we would still like to be able to omit fields based on a FieldMask. Because of the need/desire to EmitUnpopulated we cannot just "unpopulate" masked fields.

Describe the solution you'd like

I imagine adding a method,

func (o MarshalOptions) MarshalWithMask(m proto.Message, mask *fmpb.FieldMask) ([]byte, error)

which uses mechanics similar to EmitUnpopulated but instead uses the FieldMask to further decide whether to marshal a field. An empty/nil FieldMask would result in identical behavior to Marshal. This means the unexported methods can be shared by both exported methods.

Given EmitUnpopulated: true Marshal would return,

{
  "foo": 0,
  "bar": false
}

and the same with MarshalWithMask with a FieldMask of foo would return,

{
  "foo": 0
}

Describe alternatives you've considered

  1. protojson.Marshal to []byte, then unmarshal into a generic map[string]interface{} structure, walk structure deleting keys which are not contained in the mask, and finally re-marshaling back to JSON format.
  2. Same as above but using code generation with reflection to avoid interface{}.
  3. Not using EmitUnpopulated; however, users looking at raw JSON output find it very confusing when a field is missing. We're hoping the same will not be true when they provide a field mask and we would like to avoid the "noise" a bunch of unpopulated fields adds when using a narrow field mask.

Additional context

None.

jcchavezs commented 3 years ago

We have exactly the same use case and our proposal for solution might be the same, we are up to contribute on this if someone gives green to the approach.

neild commented 3 years ago

If we were to support applying a field mask to JSON marshaling, then we should also support it for text and binary marshaling. We should also consider supporting it for unmarshaling, and possibly other operations as well.

Binary marshaling and other operations defined in the proto package would be tricky, because the proto package cannot depend on the fieldmask package without creating an import cycle.

A more general-purpose approach would be to support merging messages with a fieldmask applied. (The Java protobuf implementation, for one example, provides this.) Then marshal-with-mask can be trivially written as merge-with-mask to an empty message, followed by marshaling that message.

puellanivis commented 3 years ago

Not sure this would solve the original proposed issue though, as they want to be able to distinguish between “do not show this field in the JSON at all” and “show this field even if default value”. Merging with a field mask to an empty message, would then still be stuck with either omitting/emitting all of the default values, or none of them.

jcchavezs commented 3 years ago

I think @nnutter and I have the same use case: omit certain fileds with zero values. My use case is that I want to omit zero values for strings for example but not for int or bool. Then having something like EmitUnpopulatedWithMask would do the trick:

In the example:

{
  "foo": 0,
  "bar": false
}

Using something like

EmitUnpopulatedAllowList: []string{"bar"}

To omit only emit unpopulated for bar and omit foo.

puellanivis commented 3 years ago

“unpopulated” is maybe not the right word. In proto3, unless using the optional keyword, all values are populated, just some of them have the default (zero) value, and some of them don’t. There is no distinction made between “unpopulated” and default values (except of course when using the optional keyword, which is supposed to turn on presence sensing).

nnutter commented 3 years ago

@neild I agree with @puellanivis, I do not follow how merging with field mask would address my concern.


@puellanivis, I thought Protocol Buffers Version 3 removed the optional keyword. Is there documentation saying it's available in v3? Indeed thought distinguishing between an unset/unpopulated value and a zero value is a way to solve this problem.


Maybe re-stating problem with a slightly more complicated example will highlight the "mutually exclusive" behavior.

If a zero is a meaningful value for my API I do not think that the JSON should be omitting the field. Therefore, I feel I need EmitUnpopulated turned on. However, if a user sends a field mask then it seems ideal to omit the field.

So, given an example message,

message Foo {
  string name = 1;
  int rank = 2;
  int count = 3;
}

rank and count have valid and meaningful zero values so I would like the JSON to look like,

{
  "name": "foos/1234",
  "rank": 0,
  "count": 0,
}

but if the user sends fieldMask=name,count, I would like it to look like,

{
  "name": "foos/1234",
  "count": 0,
}

As far as I know this is not possible with the current API. If optional turns out to enable that then maybe it just needs to be documented.


P.S. I realized I had not verified that EmitUnpopulated was not just a rename but this confirms it behaves the same as EmitDefaults,

package main

import (
    "fmt"

    "google.golang.org/genproto/googleapis/storage/v1"
    "google.golang.org/protobuf/encoding/protojson"
)

func main() {
    b := storage.Bucket{
        ProjectNumber: 0,
    }
    var m protojson.MarshalOptions
    bs, _ := m.Marshal(&b)
    fmt.Println(string(bs))
    m.EmitUnpopulated = true

    bs, _ = m.Marshal(&b)
    fmt.Println(string(bs))
}
$ go run . | jq
{}
{
  "acl": [],
  "defaultObjectAcl": [],
  "lifecycle": null,
  "timeCreated": null,
  "id": "",
  "name": "",
  "projectNumber": "0",
  "metageneration": "0",
  "cors": [],
  "location": "",
  "storageClass": "",
  "etag": "",
  "updated": null,
  "defaultEventBasedHold": false,
  "labels": {},
  "website": null,
  "versioning": null,
  "logging": null,
  "owner": null,
  "encryption": null,
  "billing": null,
  "retentionPolicy": null,
  "locationType": "",
  "iamConfiguration": null,
  "zoneAffinity": []
}

ProjectNumber is populated, with a 0, but it is not emitted unless EmitUnpopulated is enabled.

nnutter commented 3 years ago

@neild I'm also curious about the API breadth concerns. I am no protobuf expert at all but I kind of thought the binary format was irrelevant/orthogonal. As an example of why is that it does not have the EmitUnpopulated option to begin with. I could see this being a catalyst for recognizing that asymmetry but does it inherently have to be?

neild commented 3 years ago

As a high-level point, the purpose of the protojson package is to convert to/from the canonical protobuf JSON encoding. It is not intended to innovate on the JSON encoding used: We want all implementations of protobuf JSON encoding to behave in pretty much the same way.

I did misunderstand what you're asking for: If I follow correctly, you're basically looking to limit the effects of EmitUnpopulated to a set of fields selected or rejected by a field mask.

I don't believe any other protobuf implementations provide this feature, and I don't think we would want to add it to Go's implementation alone.

I'm also curious about the API breadth concerns. I am no protobuf expert at all but I kind of thought the binary format was irrelevant/orthogonal. As an example of why is that it does not have the EmitUnpopulated option to begin with.

I was thinking of a feature which would include or exclude fields unconditionally (not just unpopulated ones). This seems useful in the general case and it wouldn't make sense to support it in just one encoding.

puellanivis commented 3 years ago

@nnutter Yes, proto v3 removed the required and optional keywords originally, but it was argued and proposed that we should return the optional keyword, so that we can explicitly mark some fields to support presence checking. This has already been “rolled out” as an experimental feature in 3.12.0: specific details here: https://github.com/protocolbuffers/protobuf/blob/v3.12.0/docs/field_presence.md

cybrcodr commented 3 years ago

EmitUnpopulated is indeed meant for migration from using jsonpb's EmitDefaults. The term "default" is misleading because in proto2, even if there is a default field option, what is emitted for an optional proto2 field that is unpopulated is null rather than the default field option value. This option/feature is JSON-specific.

In my opinion, JSON parsers should be able to handle missing fields and treat them accordingly, i.e. if it is missing a proto3 field, then it should treat the field value with corresponding zero value. But that's just my opinion.

This request is a bit odd in that it wants only certain unpopulated fields to be emitted. Is it the case that the consumer of the data is using a different message type, one that is a subset of the message from the producer and hence cannot deal with other fields with zero value? Generally, one should try to use the same message type on both ends. If the producer uses the same message type as the consumer, then this would not be an issue even if you have unpopulated fields emitted as the consumer would recognize all the fields.

I'm not sure what library/parser is used on the consumer end. protojson.UnmarshalOptions does have the DiscardUnknown option, which I think most proto JSON parsers may have. So, even if consumer is not using the same message type, I hope it can discard fields that is not known.

nnutter commented 3 years ago

@cybrcodr The "consumers" aren't always machines, sometimes they're data scientists using a plethora of APIs, poking things with Postman, etc. and having the shape of a message "randomly" change is confusing or, at the very least, not very self-describing.

cybrcodr commented 3 years ago

The "consumers" aren't always machines, sometimes they're data scientists ...

True. In this case, the feature @neild described above would help in terms of what to project out from an existing message.

Without such feature though, the work-around would be to define a message containing only fields that need to be emitted out for the purpose of serializing to JSON, and copying fields from current existing source message beforehand.

piratf commented 3 years ago

I have similar needs.

But I think a message may have multiple masks so both optional or EmitUnpopulated may not satisfy.

Like the example below. In different scenarios, the same message may have multiple plans to generate binary data as keys. So I preferred a function to accept some parameters like field mask. Then I can customize multiple ways to serialize the desired parts.

Edit: The same goes for unmarshaling. When updating, the server may send binary data only to have part of the message to improve performance.

message Example{
  option(some_user_defined_key) = "ID, SplitKey"
  option(split_key) = "SplitKey"

  int32 ID = 1;
  int32 SplitKey = 2;
  string Data= 3;
  BigStruct OtherData = 4;
}
nnutter commented 3 years ago

Without such feature though, the work-around would be to define a message containing only fields that need to be emitted out for the purpose of serializing to JSON, and copying fields from current existing source message beforehand.

This is basically what we've resorted to in our private implementation. We're using a map[string]interface{} as the intermediate message and serializing that to JSON.

jufemaiz commented 3 years ago

:sadpanda: hitting this right now

c4milo commented 2 years ago

Has there been any update on this recently?

jufemaiz commented 2 years ago

I wonder if the Netflix team have any thoughts given their usage of it in other language bases.