Open bcmills opened 9 months ago
(CC @matttproud @kylelemons @CAFxX @Sajmani)
First of all, thank you for promptly following up on the internal inquiry and then proposing this change of language. :-)
Looking at your proposed snippet, I have a few remarks:
Note that the returned context might not be suitable for asynchronous use.
Can we demonstrate and clarify precisely what kinds of asynchronous situations are potentially unsafe? I think we owe to the reader to help clear up confusion. Over the years, people have wondered what is safe and unsafe with the internal API that has documented this warning. Consider these three scenarios:
// Definitely unsafe
go f() {
ctx := context.WithoutCancel(ctx)
v := somelibrary.FromContext(ctx)
...
}()
// Definitely unsafe
go f(ctx context.Context) {
v := somelibrary.FromContext(ctx)
...
}(context.WithoutCancel(ctx))
// Subtle: is it unsafe (probably)?
detachedCtx := context.WithoutCancel(ctx)
v := somelibrary.FromContext(detachedCtx)
go f() {
// use v
...
}()
// may be flushed or closed when control returns from the function to which the parent
This is a rhetorical remark. There's a fair bit of subtly in this text that, I think, expects the reader to know that one of two things could happen:
defer v.Close()
where v.Close
renders the value v
received from the/attached to the context no longer interesting (e.g., a trace span). I'm assuming — probably erroneously — that few tracing libraries have adapted the context.AfterFunc
API.f
calls function g
. f
creates a child context with context.WithCancel
and defers its closure. f
calls g
with that child context. When g
finishes the child context is closed. A good example of f
is a RPC library (or middleware) and g
is the user-implemented RPC method handler. I wouldn't recommend that the documentation mentions these possibilities (so verbosely), but I wonder if there is any way we could alert a reader to be aware of when a passed in context could be canceled so that it's not surprising. A bad tracer shot: "Pay attention when you provide a function to an external API that calls the function with a context parameter."
// was originally passed, rendering those values invalid to use.
Could we place the adverb "potentially" before "rendering"? I think it's not always a guarantee of invalidity, so we wouldn't want to overstate that, no?
FWIW I agree, in fact similar language was also part of my original proposal:
we should clarify a point that is only laid out implicitly in the context API docs, i.e., that values attached to the Context can also be used after the Context has been canceled.
(and further elaborated on why this necessarily had to be the case in https://github.com/golang/go/issues/40221#issuecomment-773064051)
Unfortunately the relevant wording was watered down against my advice during CL review.
My opinion is still the same, we should first clarify at the package level that Context-attached values should, in general, remain valid (as in "well-behaved even if not functional") even after the Context is cancelled, since this is a situation that can occur regardless of WithoutCancel
. If we do that, we can also specifically highlight the issue in WithoutCancel
, providing guidance as needed.
I agree that it's important that values stay valid after cancellation, but I would like to highlight that cancellation is not the problem here. The problem is that lifetimes of context values are often defined by a function scope.
The case why context values need to be valid after cancellation is trivial: If values become invalid after cancellation, the code below would contain a race independently of the use of context.WithoutCancel
.
func f(ctx context.Context) {
select {
case <-ctx.Done():
return
default:
v := ctx.Value(key)
}
}
The more subtle problem is that context values are used in ways that require something to happen after an operation (e.g. request processing) has finished. That something is a lifetime event that's decoupled from the context lifetime.
For example (elaborating on my comment on go.dev/cl/459016), the documentation OpenTelemetry has this example code:
func operation(ctx context.Context, inst *Instrumentation) {
var span trace.Span
ctx, span = inst.tracer.Start(ctx, "operation")
defer span.End()
// ...
}
inst.tracer.Start
stores a newly created Span
in ctx
. A Span must be completed using span.End()
otherwise memory or other resources may leak:
// Start creates a span and a context.Context containing the newly-created span.
//
// If the context.Context provided in `ctx` contains a Span then the newly-created
// Span will be a child of that span, otherwise it will be a root span. This behavior
// can be overridden by providing `WithNewRoot()` as a SpanOption, causing the
// newly-created Span to be a root span even if `ctx` contains a Span.
//
// When creating a Span it is recommended to provide all known span attributes using
// the `WithAttributes()` SpanOption as samplers will only have access to the
// attributes provided when a Span is created.
//
// Any Span that is created MUST also be ended. This is the responsibility of the user.
// Implementations of this API may leak memory or other resources if Spans are not ended.
Start(ctx context.Context, spanName string, opts ...SpanStartOption) (context.Context, Span)
It's also not allowed to update the Span
after span.End()
has been called:
// End completes the Span. The Span is considered complete and ready to be
// delivered through the rest of the telemetry pipeline after this method
// is called. Therefore, updates to the Span are not allowed after this
// method has been called.
End(options ...SpanEndOption)
This means that if an OpenTelemetry Span
is stored in a context.Context
, it's not safe to use the context after the end of an operation, but it's perfectly fine to use context.WithoutCancel
.
To illustrate: To understand if the code below is correct, it's necessary to understand if any of the values in ctx
has Span
value and if that value is used by somepkg.Do
or callees.
func f(ctx context.Context) {
detachedCtx := context.WithoutCancel(ctx)
go do() {
somepkg.Do(detachedCtx)
}()
}
func g(ctx context.Context) {
go do() {
somepkg.Do(ctx)
}()
}
As we established before, it's not possible to couple the lifetime of Span
to the context lifetime (e.g. using context.AfterFunc
). The lifetime has to be decoupled. This means it's generally unsafe to use a context in a goroutine with a lifetime longer than the spawning function. Cancellation is not relevant for this.
@CAFxX, as @znkr notes, the problem here is the use-after-return, not the use-after-cancel. Because cancellation is already cooperative and mostly asynchronous, it should always be ok to use Context
values after cancellation has begun.
My main concern here is the sort of cleanup or finalization that one might defer
in a call frame.
Proposal Details
In #40221, a
WithoutCancel
function was added to thecontext
package.However, as noted in https://github.com/golang/go/issues/40221#issuecomment-660273706, a
Context
created by a call toWithoutCancel
is in general only safe to use synchronously:So in order to safely use
context.WithoutCancel
for an asynchronous call, one must have global information about both the callees (to ensure that the callee does not access a value with a call-scoped) and the callers (to ensure that the context does not close over a large, irrelevant value). Unfortunately, the current documentation forWithoutCancel
doesn't prompt the reader to consider that subtlety at all, and when users do think about it they often don't have enough context (ha!) to think of these caveats on their own.Indeed, some of the very first mentions of this new API — such as the SO answer linked in https://github.com/golang/go/issues/40221#issuecomment-1539683747 — suggest its use for asynchronous calls and neglect to consider or mention these factors.
I have already fielded multiple questions in Google-internal forums about whether
context.WithoutCancel
is safe to use in asynchronous code. I would like to answer those questions preemptively in the documentation rather than individually after the fact.I propose that we add the following paragraph to the
context.WithCancel
documentation: