envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.91k stars 4.79k forks source link

Ability to assert on stat lifetime issues in debug build #18047

Open ggreenway opened 3 years ago

ggreenway commented 3 years ago

There have been some issues recently with stat references (Counter& et al) outliving their owning Scope. This can lead to use-after-free.

One way to help detect this would be for Counter& and friends to be replaced with a class CounterRef. This class, in a release build, would only contain a Counter&, to avoid a huge increase in memory use. But in debug builds. This object would implement the Counter interface, passing all calls through to the stored Counter&.

In a debug build, this object would also hold a reference to the Scope that created it, and call a new method on the Scope to inform that it is using a stat from that Scope. In Scope's destructor, it would ASSERT if it is still in use by anything.

Hopefully, the macros that create most stats could be modified to work for this, instead of needing to touch all code creates stats.

jmarantz commented 3 years ago

This plan sgtm.

@pradeepcrao who has been working with stats here and would be a good candidate for a first-pass reviewer.

ggreenway commented 3 years ago

For anyone who works on this: please ask for feedback early in the process. This is an experimental idea, and it may not work out.

mattklein123 commented 3 years ago

FWIW my idea here was to do roughly the following: 1) On release builds, do exactly what we do today. 2) On debug builds, do the following: 1) Have a wrapper class CounterRef which implements Counter. 2) Store the underlying counter, the scope, etc. 3) Store this class internally (perhaps even globally though that might leak over time) so a reference can be returned. 4) Return the Counter& to the caller. 5) Add additional checking on usage.

jmarantz commented 3 years ago

A few questions come to mind:

  1. Should CounterRef inherit from Counter? Or just mirror the interface?
  2. Somewhere there's #ifdef NDEBUG to differentiate prod/debug behavior. Where exactly?
  3. How does this interact with stats_macros.h? e.g. COUNTER(foo) and other variants?
  4. Do we change 100% of the counter refs to this? Or is this just used where it's not obvious that scopes outlive the counters?

And there's another question that IMO deserves asking: what would be the cost we never used Counter& and always used CounterSharedPtr? I would argue that would be a simpler model to reason about, but it might cause some perf issues, which we could measure. It would also be ridiculously large change.

mattklein123 commented 3 years ago

And there's another question that IMO deserves asking: what would be the cost we never used Counter& and always used CounterSharedPtr? I would argue that would be a simpler model to reason about, but it might cause some perf issues, which we could measure. It would also be ridiculously large change.

I agree this would be better in hindsight. I'm not going to sign up for this. :)

ggreenway commented 3 years ago

What would be the relationship between the Stat and the Scope if lifetime of a stat is managed by a shared_ptr? Is it valid to keep the stat after the scope is deleted?

mattklein123 commented 3 years ago

What would be the relationship between the Stat and the Scope if lifetime of a stat is managed by a shared_ptr? Is it valid to keep the stat after the scope is deleted?

I think it would have to be something like a weak_ptr, or a shared_ptr that wraps a weak_ptr, making it more complicated of a change, and have more perf/memory implications.

jmarantz commented 3 years ago

With CounterSharedPtr as is, the counter would be kept alive as long as there was an outstanding CounterSharedPtr, and would not be removed if the scope disappeared. I was thinking that might be an OK state to leave things in.

Note that CounterSharedPtr is not a std::shared_ptr. It's a RefcountPtr, a class we define, which inlines the refcount (which is packed into 32 bits along with the flags). If we change to std::shared_ptr that would add 8 bytes per counter (at least).

ggreenway commented 3 years ago

With CounterSharedPtr as is, the counter would be kept alive as long as there was an outstanding CounterSharedPtr, and would not be removed if the scope disappeared. I was thinking that might be an OK state to leave things in.

If we did that, we could make Scope only for namespacing, not for lifetime.

Note that CounterSharedPtr is not a std::shared_ptr. It's a RefcountPtr, a class we define, which inlines the refcount (which is packed into 32 bits along with the flags). If we change to std::shared_ptr that would add 8 bytes per counter (at least).

Oh, I missed that detail; thanks for clarifying.

jmarantz commented 3 years ago

Another fun fact: it's valid to write:

CounterSharedPtr makeShared(Counter& counter) {
  return CounterSharedPtr(&counter);
}

That would not work for std::shared_ptr, but it's OK for RefcountPtr<Counter> because the ref-counts are in the object. This might be of interest in case there are a few interesting cases where a counter created from a scope might outlast the scope. Because that's valid you could leave Scope::counterFromStatName returning a Counter& and choose on a case-by-case basis to store them in a CounterSharedPtr.