envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.79k stars 4.76k forks source link

Changes to defaults for existing behaviors in xDS APIs #10517

Open htuch opened 4 years ago

htuch commented 4 years ago

There's been a few recent PRs that have been interested in changing Envoy's default behavior w.r.t. data plane behavior, either immediately or within quarterly release cycles (i.e. not following the API clock). One currently active PR is https://github.com/envoyproxy/envoy/pull/10467, but there are others in the works I'm aware of.

The last paragraph of https://github.com/envoyproxy/envoy/blob/master/api/API_VERSIONING.md#backwards-compatibility currently allows for wrapper types, e.g. BoolValue to have their defaults vary in implementation defined ways. Presumably this extends also to messages whose unset default behavior also implies a default.

I think it's probably time to revisit ^^ in light of multi-client support (e.g. gRPC). My main concern here is that for brand new fields that change existing behaviors (rather than a new behavior and its defaults), it's dangerous and not in keeping with goals of stable API versioning to have the default behavior change outside of API major versions.

A workaround might be to introduce a runtime key reference in the configuration for default behavior. Runtime defaults are currently outside of the scope of the API major versioning policy and each API client can decide on its own default. This would technically be correct re: API stable versioning, but the effective behavior might still result in clients exhibiting divergent behavior with the same configuration.

I'm opening this thread up for open discussion on where we want to go here, as I think there are a range of opinions to be heard.

CC @markdroth @envoyproxy/api-shepherds @ggreenway @alyssawilk @yanavlasov @ggreenway

markdroth commented 4 years ago

In general, I think that changing the default of an existing field is a breaking change, because it requires management servers and clients to be in sync with regard to what behavior will actually occur. And it's actually the worst kind of breaking change, because there's no way to catch the problem at build time -- the only way to discover it is for things to break at run-time.

@ejona86 and @dfawley may also want to weigh in here.

antoniovicente commented 4 years ago

For proto3, default values are not encoded in proto definitions. Defaults are often mentioned in comments and implemented in code. Getting consistent behavior across APIs is challenging when each API consumer must implement default value extraction on their own in order to get consistency. Changes to default values provided by code make the situation even more complicated.

Going through a process of removing the old field name as part of a version bump may help as it would result in a compile error for consumers of the old config field, across API consumers.

mattklein123 commented 4 years ago

FWIW one of the things we have talked about in the past is having WKT default values be explicitly encoded in annotations, similar to PGV. Personally, I like this, as it would make it more explicit when we expect a default to be implementation wide vs. one that might be implementation specific.

htuch commented 4 years ago

I'm a fan of default annotations. I assume we would machine check to ensure they are present for all WKTs?

The problem goes beyond just WKTs though, the general case of what happens when a message field isn't set, or a new enum/message field is introduced that changes existing behavior if not set.

mattklein123 commented 4 years ago

The problem goes beyond just WKTs though, the general case of what happens when a message field isn't set, or a new enum/message field is introduced that changes existing behavior if not set.

Yes, agreed. I suppose much like the validation code we could do this also for non-WKT types. We could even choose to build this into PGV if we wanted.