census-instrumentation / opencensus-cpp

A stats collection and distributed tracing framework
https://opencensus.io/
Apache License 2.0
145 stars 76 forks source link

Metrics should expire from stackdriver time_series/view vector #451

Closed mandarjog closed 4 years ago

mandarjog commented 4 years ago

The following vector grows indefinitely in memory.

https://github.com/census-instrumentation/opencensus-cpp/blob/85e29c4d815171656d54fdf2dc5b74873a4a62bd/opencensus/exporters/stats/stackdriver/internal/stackdriver_exporter.cc#L106

  1. This vector should be pruned periodically.
  2. It should be possible to access the current metrics cardinality, or present it during pruning.
mandarjog commented 4 years ago

/cc @g-easy @bogdandrutu @bianpengyuan

g-easy commented 4 years ago

IIUC, pruning is motivated by wanting to have a "sparse" map where timeseries eventually disappear. What is the logic for when a timeseries should reset / disappear? Or should the stackdriver exporter expose an API for "reset some timeseries" and have the calling code specify the logic?

mandarjog commented 4 years ago

We can start simple by exposing a clear() api which should clear() state at next export so as to not drop any data. @bianpengyuan ?

bianpengyuan commented 4 years ago

I think we'd need to purge stale time series from ViewData. A rough idea is to add a clear() function to stats manager, which cleans up Viewdata that has not been updated for a given time period. This also means that we need to track last updated time of each ViewData.

bianpengyuan commented 4 years ago

@g-easy any feedback on this approach? Do you think if this is feasible without major surgery to the code?

bianpengyuan commented 4 years ago

Talked with mandarjog, the other option is to add a SetExpiry() method into StatsManager, and whenever a harvest cycle happens, clean up view that has been updated for the given period. Comparing to the Clean() method, this would help to flatten the latency from lock.

g-easy commented 4 years ago

Expiry done as part of the existing background thread stats aggregation sounds reasonable to me. Should expiry duration be part of StatsManager, or part of the ViewDescriptor?

bianpengyuan commented 4 years ago

Ah, yes, viewDescriptor makes more sense. I can take a stab on this.

g-easy commented 4 years ago

Yes please. I'm happy to review but I'm very limited in how much time I can spend on OC these days. :(

BTW I suggest ViewDescriptor because it's part of the public API whereas StatsManager is not.