proposal: context: document that WithoutCancel may not be sufficient for asynchronous use

Proposal Details

In #40221, a WithoutCancel function was added to the context package.

However, as noted in https://github.com/golang/go/issues/40221#issuecomment-660273706, a Context created by a call to WithoutCancel is in general only safe to use synchronously:

The original parent context may contain values (from packages like those listed in https://github.com/golang/go/issues/40221#issuecomment-660270769) that are no longer usable — and perhaps not even safe to use! — after the return from the function call to which the context was passed.
The values contained in the parent context may refer to large amounts of memory (such as for logging datastructures), which will not be garbage-collected while that context remains reachable.

So in order to safely use context.WithoutCancel for an asynchronous call, one must have global information about both the callees (to ensure that the callee does not access a value with a call-scoped) and the callers (to ensure that the context does not close over a large, irrelevant value). Unfortunately, the current documentation for WithoutCancel doesn't prompt the reader to consider that subtlety at all, and when users do think about it they often don't have enough context (ha!) to think of these caveats on their own.

Indeed, some of the very first mentions of this new API — such as the SO answer linked in https://github.com/golang/go/issues/40221#issuecomment-1539683747 — suggest its use for asynchronous calls and neglect to consider or mention these factors.

I have already fielded multiple questions in Google-internal forums about whether context.WithoutCancel is safe to use in asynchronous code. I would like to answer those questions preemptively in the documentation rather than individually after the fact.

I propose that we add the following paragraph to the context.WithCancel documentation:

// WithoutCancel returns a copy of parent that is not canceled when parent is canceled.
// The returned context returns no Deadline or Err, and its Done channel is nil.
// Calling [Cause] on the returned context returns nil.
//
// Note that the returned context might not be suitable for asynchronous use.
// The child context continues to refer to the parent for its implementation of the
// [Context.Value] method, which will prevent memory reachable from the parent from
// being garbage-collected while the child remains reachable.
// In addition, some of the parent's values (particularly for logging and tracing libraries)
// may be flushed or closed when control returns from the function to which the parent
// was originally passed, rendering those values invalid to use.

(CC @matttproud @kylelemons @CAFxX @Sajmani)

First of all, thank you for promptly following up on the internal inquiry and then proposing this change of language. :-)

Looking at your proposed snippet, I have a few remarks:

Note that the returned context might not be suitable for asynchronous use.

Can we demonstrate and clarify precisely what kinds of asynchronous situations are potentially unsafe? I think we owe to the reader to help clear up confusion. Over the years, people have wondered what is safe and unsafe with the internal API that has documented this warning. Consider these three scenarios:

// Definitely unsafe
go f() {
  ctx := context.WithoutCancel(ctx)
  v := somelibrary.FromContext(ctx)
  ...
}()

// Definitely unsafe
go f(ctx context.Context) {
  v := somelibrary.FromContext(ctx)
  ...
}(context.WithoutCancel(ctx))

// Subtle: is it unsafe (probably)?
detachedCtx := context.WithoutCancel(ctx)
v := somelibrary.FromContext(detachedCtx)
go f() {
  // use v
  ...
}()

// may be flushed or closed when control returns from the function to which the parent

This is a rhetorical remark. There's a fair bit of subtly in this text that, I think, expects the reader to know that one of two things could happen:

The function that this text is talking about internally has a defer v.Close() where v.Close renders the value v received from the/attached to the context no longer interesting (e.g., a trace span). I'm assuming — probably erroneously — that few tracing libraries have adapted the context.AfterFunc API.
The function f calls function g. f creates a child context with context.WithCancel and defers its closure. f calls g with that child context. When g finishes the child context is closed. A good example of f is a RPC library (or middleware) and g is the user-implemented RPC method handler.

I wouldn't recommend that the documentation mentions these possibilities (so verbosely), but I wonder if there is any way we could alert a reader to be aware of when a passed in context could be canceled so that it's not surprising. A bad tracer shot: "Pay attention when you provide a function to an external API that calls the function with a context parameter."

// was originally passed, rendering those values invalid to use.

Could we place the adverb "potentially" before "rendering"? I think it's not always a guarantee of invalidity, so we wouldn't want to overstate that, no?

FWIW I agree, in fact similar language was also part of my original proposal:

we should clarify a point that is only laid out implicitly in the context API docs, i.e., that values attached to the Context can also be used after the Context has been canceled.

(and further elaborated on why this necessarily had to be the case in https://github.com/golang/go/issues/40221#issuecomment-773064051)

Unfortunately the relevant wording was watered down against my advice during CL review.

My opinion is still the same, we should first clarify at the package level that Context-attached values should, in general, remain valid (as in "well-behaved even if not functional") even after the Context is cancelled, since this is a situation that can occur regardless of WithoutCancel. If we do that, we can also specifically highlight the issue in WithoutCancel, providing guidance as needed.

I agree that it's important that values stay valid after cancellation, but I would like to highlight that cancellation is not the problem here. The problem is that lifetimes of context values are often defined by a function scope.

The case why context values need to be valid after cancellation is trivial: If values become invalid after cancellation, the code below would contain a race independently of the use of context.WithoutCancel.

func f(ctx context.Context) {
  select {
    case <-ctx.Done():
       return
     default:
       v := ctx.Value(key)
  }
}

The more subtle problem is that context values are used in ways that require something to happen after an operation (e.g. request processing) has finished. That something is a lifetime event that's decoupled from the context lifetime.

For example (elaborating on my comment on go.dev/cl/459016), the documentation OpenTelemetry has this example code:

func operation(ctx context.Context, inst *Instrumentation) {
    var span trace.Span
    ctx, span = inst.tracer.Start(ctx, "operation")
    defer span.End()
    // ...
}

inst.tracer.Start stores a newly created Span in ctx. A Span must be completed using span.End() otherwise memory or other resources may leak:

// Start creates a span and a context.Context containing the newly-created span.
//
// If the context.Context provided in `ctx` contains a Span then the newly-created
// Span will be a child of that span, otherwise it will be a root span. This behavior
// can be overridden by providing `WithNewRoot()` as a SpanOption, causing the
// newly-created Span to be a root span even if `ctx` contains a Span.
//
// When creating a Span it is recommended to provide all known span attributes using
// the `WithAttributes()` SpanOption as samplers will only have access to the
// attributes provided when a Span is created.
//
// Any Span that is created MUST also be ended. This is the responsibility of the user.
// Implementations of this API may leak memory or other resources if Spans are not ended.
Start(ctx context.Context, spanName string, opts ...SpanStartOption) (context.Context, Span)

It's also not allowed to update the Span after span.End() has been called:

// End completes the Span. The Span is considered complete and ready to be
// delivered through the rest of the telemetry pipeline after this method
// is called. Therefore, updates to the Span are not allowed after this
// method has been called.
End(options ...SpanEndOption)

This means that if an OpenTelemetry Span is stored in a context.Context, it's not safe to use the context after the end of an operation, but it's perfectly fine to use context.WithoutCancel.

To illustrate: To understand if the code below is correct, it's necessary to understand if any of the values in ctx has Span value and if that value is used by somepkg.Do or callees.

func f(ctx context.Context) {
  detachedCtx := context.WithoutCancel(ctx)
  go do() {
    somepkg.Do(detachedCtx)
  }()
}

func g(ctx context.Context) {
  go do() {
    somepkg.Do(ctx)
  }()
}

As we established before, it's not possible to couple the lifetime of Span to the context lifetime (e.g. using context.AfterFunc). The lifetime has to be decoupled. This means it's generally unsafe to use a context in a goroutine with a lifetime longer than the spawning function. Cancellation is not relevant for this.

@CAFxX, as @znkr notes, the problem here is the use-after-return, not the use-after-cancel. Because cancellation is already cooperative and mostly asynchronous, it should always be ok to use Context values after cancellation has begun.

My main concern here is the sort of cleanup or finalization that one might defer in a call frame.

golang / go

proposal: context: document that WithoutCancel may not be sufficient for asynchronous use #64478

Proposal Details