census-instrumentation / opencensus-go

A stats collection and distributed tracing framework
http://opencensus.io
Apache License 2.0
2.05k stars 327 forks source link

plugin/ocgrpc incorrect metrics for bidirectional stream #1006

Open pjanotti opened 5 years ago

pjanotti commented 5 years ago

Describe the bug Metrics for ocgrpc on bidirectional stream seem to be collected only for initialization and cancellation.

To Reproduce Steps to reproduce the behavior:

  1. Clone the census-instrumentation/opencensus-service repo
  2. From the repo root launch: go run ./cmd/occollector/main.go --debug-processor this process will show Prometheus metrics at http://localhost:8888/metrics
  3. From the repo root launch: go run ./example/main.go
  4. After 10 seconds check http://localhost:8888/metrics, it was expected to have metrics for grpc_server_method="opencensus.proto.agent.trace.v1.TraceService/Export" but there are only for grpc_server_method="opencensus.proto.agent.trace.v1.TraceService/Config"
  5. Terminate the process started on step 3, then metrics for grpc_server_method="opencensus.proto.agent.trace.v1.TraceService/Export" will show up (covering the cancellation event)

The relevant source code on opencensus-service repo is: https://github.com/census-instrumentation/opencensus-service/blob/38c9550146b49e0bb95ef1784df56a187e912dab/internal/observability.go#L110-L113

Expected behavior Metrics for both methods and specially for the data sent via Export

Additional context See https://github.com/census-instrumentation/opencensus-service/issues/287

songy23 commented 5 years ago

/cc @rakyll @rghetia Is this a bug or is it the expected behavior?

rghetia commented 5 years ago

@songy23 what is the behavior in java?

rakyll commented 5 years ago

This sounds like a bug. The bidirectional metrics should be per call not just for init and cleanup. Otherwise, they are useless.

songy23 commented 5 years ago

In Java:

Before we don't have real-time metrics reporting for streaming RPCs. You won't be able to see the metrics until RPC finished (which is the same to the scenario @pjanotti described). However recently gRPC added some additional real-time reporting measures (https://github.com/grpc/grpc-java/pull/5099). Those measures are meant to be used for reporting metrics in real-time for long-lived RPCs. Not sure whether this is available in Go.