golang / protobuf

Go support for Google's protocol buffers
BSD 3-Clause "New" or "Revised" License
9.8k stars 1.58k forks source link

Consider using the `debug_redact` field / enum option when marshaling into prototext #1655

Open ivucica opened 2 weeks ago

ivucica commented 2 weeks ago

Is your feature request related to a problem? Please describe.

Field debug_redact has been added around Protobuf 22 to mark sensitive fields as being sensitive. (Amusingly, this means protoc that ships in Debian, as late as unstable, does not support this field yet.)

Not using debug_redact means that fields that are tagged as sensitive will be redacted in some languages such as C++, but not in Go.

Please note that debug_redact can be used on more than just fields; but, its purpose on fields is clearer than on enums.

Describe the solution you'd like

A solution should be found so that serialization for purposes of text output results in fields being redacted, but otherwise not.

In C++, TextFormat::Printer::PrintFieldValue invokes TryRedactFieldValue which uses this option if the serializer has been initialized with redact_debug_string_ option:

https://github.com/protocolbuffers/protobuf/blob/5d58caeebc5c2779ab4db87285280784159f9ed4/src/google/protobuf/text_format.cc#L3026-L3055

Java computes whether a field is sensitive based on this option: https://github.com/protocolbuffers/protobuf/blob/5d58caeebc5c2779ab4db87285280784159f9ed4/java/core/src/main/java/com/google/protobuf/Descriptors.java#L1837-L1859

This is then used if enablingSafeDebugFormat is flipped true:

https://github.com/protocolbuffers/protobuf/blob/5d58caeebc5c2779ab4db87285280784159f9ed4/java/core/src/main/java/com/google/protobuf/TextFormat.java#L628-L634

For Go, it is not completely clear to me when this should be used. Adding it to the right place seems trivial: it likely belongs in marshalField or wherever marshalSingular is called, and an option can be used at that point: https://github.com/protocolbuffers/protobuf-go/blob/b98563540c0a4edb38526bcd6e6c97f9fac1f453/encoding/prototext/encode.go#L210

However, once the encoder supports this, should the Stringer interface (String()) on a generated message proceed to output the redacted or the non-redacted version of the message?

It feels like sensitive fields should be saved only when explicitly requested, but unfortunately, existing code before the field existed or before it was used may have different expectations.

Describe alternatives you've considered

Two main approaches:

  1. Write a custom serializer that redacts the fields. Never use str(), .String() or %s / %v on a generated proto message without using the custom formatter+redactor, just as one would have to do with a custom option. (This would then apply whether logging or not, as there can be no guarantees of what will happen with the proto is fmt.Sprintf'd.) Process generated .pb.go code to add // Deprecated: Using String() directly does not redact fields. to the docstring so the tooling starts to output warnings about use of String().
  2. Choose to consider debug_redact broken and ignore the existence of this field option, accepting that potentially sensitive messages will end up in logs or other places where they should not.
    • Possibly: send a pull request to base protobuf repo to add a comment to debug_redact option to clarify that implementing is optional, implementation specific and that depending on it taking effect without examining the behavior in every used language is risky.

As is, the option is present and usable, but will not actually result in redaction of fields in sensitive contexts.

Additional context

It may be worth considering otherwise allowing modification of the marshaling of the fields, in case other options / annotations affect the field in other, non-logging contexts.

As a example, customization might be useful because a field might not be sensitive for logging, but it might be too sensitive to display to some types of system administrators, or to send to end users.

This consideration on allowing customizing serialization would likely apply across languages.

puellanivis commented 2 weeks ago

At first, I suspected this were a duplicate of #1541 however, having read through the issue, it’s different, because this debug_redact behavior exists in C++ and Java at least. So the most common concern of “yes, but is it in the other big 4?” is already met.

I would say, there is no condition where we would ever want to deprecate String(). The fmt.Stringer interface is one of the core and ubiquitous concepts of the language. And removing String() won’t stop the protobuf messages from being stringified with these redacted fields unredacted, for example, if done through fmt.Sprint() or %v formatting verb. https://go.dev/play/p/huhWv1vQ3cH

Next, we would not want to add any options that default to true. All options should be worded so that their default is naturally false. This is another practical Go convention. prototext.MarshalOptions{} has to correspond with all default values. So adding a field that defaults to true requires seeing the field set to the zero value (false) and setting it to true. So, the option would not be possible to disable.

Now that I’ve covered all the “bad news”, here’s the good news from my side: we’ve held for quite a while that prototext is only intended to be a human-readable debugging format. People who do not want the fields redacted can always elect to simply not add the tag. And, if we implement it like C++, then it affects only prototext output anyways. Individuals remain on their own to implement any sort of redactor for JSON formatting.

Pushing it as a universal default might be a bit much (though from my view, not entirely out of the question). But at worst, we could add something like a GOLANG_PROTOBUF_DEBUG_REDACT=true env var and/or an -ldflags "-X google.golang.org/protobuf/encoding/prototext.debug_redact=true" compile time option to turn it on/off by default. Projects and companies could then elect to opt-in the same as C++. And build it into scripts anywhere they want it enforced as policy.

As for how to emit a message without redaction in a new world where redaction would be a possible default, that could be as simple as adding a new EmitRedactedFields field to prototext.Marshal, then anyone who wants an unredacted version regardless of defaults can simply use the prototext.MarshalOptions.Format method to print an unredacted version.

TL;DR: C++ and Java already support this, any change of behavior is in what we already consider human-oriented debug messages, we could provide reasonable knobs to set desired default behavior at runtime or compile time, and the redaction is already field-specific opt-in already.

neild commented 2 weeks ago

This seems reasonable. Actually, I'm surprised debug_redact was added without Go support--it seems like an oversight.

I'd say that:

I'm not sure about prototext.Format. I lean towards saying that it should redact.

puellanivis commented 2 weeks ago

I like those defaults and knobs.

stapelberg commented 2 weeks ago

@ivucica Would you be able to send a Gerrit change? See https://github.com/protocolbuffers/protobuf-go/blob/master/CONTRIBUTING.md

ivucica commented 2 weeks ago

To confirm, consensus is default to redacting, and no envvar? If I’m implementing this, happy to add the envvar.

Any particular documentation you’d like me to update?

I can’t offer a timeline for this contribution, as this is a personal interest, but (aside from any tests I should update) it seems simple enough, so I can try giving it a spin. If someone beats me to it, I won’t complain, of course.

puellanivis commented 2 weeks ago

I don’t think we need an ENV var, no. With the options as proposed by nield, we should have sane enough behavior overall to not need to offer a kill-switch ENV var.