census-ecosystem / opencensus-go-exporter-stackdriver

OpenCensus Go exporter for Stackdriver Monitoring and Trace
Apache License 2.0
67 stars 79 forks source link

ocsql.RecordStats failes with InvalidArgument 'The start time must be equal to the end time' #244

Closed eqinox76 closed 4 years ago

eqinox76 commented 4 years ago

What version of the Exporter are you using?

0.12.8

What version of OpenCensus are you using?

0.22.2

What version of Go are you using?

go version go1.13 linux/amd64

What did you do?

Exporting the default sql metrics to stackdriver:

// initialise exporter
sdExporter, err := stackdriver.NewExporter(stackdriver.Options{
    ProjectID: project,
    Location:  zone,
    MetricPrefix: "cis",
    ReportingInterval: time.Minute,
})

err = sdExporter.StartMetricsExporter()
view.RegisterExporter(sdExporter)
view.SetReportingPeriod(time.Minute)
...
// register metrics from [database via opencensus](https://opencensus.io/integrations/sql/go_sql/)
ocsql.RegisterAllViews()
ocsql.RecordStats(myDbHandle, time.Minute)

What did you expect to see?

Database metrics in stackdriver. As nearly no self written code is involved i would expect these sql metrics to just work.

What did you see instead?

Very often errors like

2019/12/19 11:42:00 Failed to export to Stackdriver: rpc error: 
code = InvalidArgument desc = Field timeSeries[0].points[0].interval.start_time 
had an invalid value of "2019-12-19T02:39:59.668607-08:00": The start time 
must be equal to the end time (2019-12-19T03:41:59.668868-08:00) for the 
gauge metric 'custom.googleapis.com/opencensus/go.sql/client/calls'.

Additional context

Checking stackdrivers docu:

The time interval to which the data point applies. For GAUGE metrics, the start time is optional, but if it is supplied, it must equal the end time. For DELTA metr...

I suspect a similar issue like reported for istio

Looking through the code would imply to just change:

func metricPointToMpbPoint(startTime *timestamp.Timestamp, pt *metricdata.Point, projectID string) (*monitoringpb.Point, error) {
    if pt == nil {
        return nil, nil
    }

    mptv, err := metricPointToMpbValue(pt, projectID)
    if err != nil {
        return nil, err
    }

    mpt := &monitoringpb.Point{
        Value: mptv,
        Interval: &monitoringpb.TimeInterval{
            StartTime: startTime,
            EndTime:   timestampProto(pt.Time),
        },
    }
    return mpt, nil
}

But the unittests clearly point at the StartTime and EndTime being expected to differ. Which makes also sense in my head. So i'm clearly missing too much context to decide how to fix this issue.

Thanks for providing this project and support!

rghetia commented 4 years ago

@eqinox76 something is odd here. go.sql/client/call is defined as Count. Count is converted to metric kind Cummulative. How does it become gauge? Is it possible that go.sql/client/calls was created as gauge before using some other mechanism? And now it is being updated as Cummulative?

I tried running the gosql example and I don't see any errors. Here is the metricDescriptor for go.sql/client/calls (it is CUMMULATIVE)

{
  "name": "projects/my-project-xxx/metricDescriptors/custom.googleapis.com/opencensus/go.sql/client/calls",
  "labels": [
    {
      "key": "go_sql_error"
    },
    {
      "key": "go_sql_method"
    },
    {
      "key": "go_sql_status"
    },
    {
      "key": "opencensus_task",
      "description": "Opencensus task identifier"
    }
  ],
  "metricKind": "CUMULATIVE",
  "valueType": "INT64",
  "unit": "1",
  "description": "The number of various calls of methods",
  "displayName": "OpenCensus/go.sql/client/calls",
  "type": "custom.googleapis.com/opencensus/go.sql/client/calls",
  "monitoredResourceTypes": [
    "aws_ec2_instance",
    "cloud_composer_environment",
    "cloud_composer_workflow",
    "dataflow_job",
    "gce_instance",
    "generic_node",
    "generic_task",
    "gke_container",
    "global",
    "k8s_cluster",
    "k8s_container",
    "k8s_node",
    "k8s_pod",
    "knative_broker",
    "knative_revision",
    "knative_trigger"
  ]
}
eqinox76 commented 4 years ago

Indeed. Thanks for the pointers, i will do some more digging and see if i can figure out what exactly happens.

eqinox76 commented 4 years ago
    {
      "name": "projects/my-project-xxx/metricDescriptors/custom.googleapis.com/opencensus/go.sql/client/calls",
      "labels": [
        {
          "key": "go_sql_method"
        },
        {
          "key": "go_sql_status"
        },
        {
          "key": "opencensus_task"
        }
      ],
      "metricKind": "GAUGE",
      "valueType": "INT64",
      "description": "Auto-created custom metric.",
      "type": "custom.googleapis.com/opencensus/go.sql/client/calls",
      "monitoredResourceTypes": [
        ...
      ]
    },

Seems indeed very odd. The metricKind is wrong as is the description. Searching through the code and dependencies also does not find this description string. I assume the metricDescriptor was generated by a older version. I will delete it and check what opencensus recreates (will take some time).

Thanks again for the pointers!

eqinox76 commented 4 years ago

Took a while to get access... Now when the metricDescriptor gets deleted via https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.metricDescriptors/delete It immediately gets recreated as GAUGE. I cannot figure out what code does that. Running grep over the local go source and all dependencies does not find the description.

Further investigation shows that there are two metricDescriptors:

      "name": "projects/my-project-xxx/metricDescriptors/custom.googleapis.com/opencensus/cis/go.sql/client/calls",
      "labels": [],
      "metricKind": "CUMULATIVE",
      "valueType": "INT64",
      "name": "projects/my-project-xxx/metricDescriptors/custom.googleapis.com/opencensus/go.sql/client/calls",
      "labels": [],
      "metricKind": "GAUGE",
      "valueType": "INT64",

Notice the missing cis prefix in the name of the wrong one. I assume that somehow my wrong double exporter registration is messing something up:

// initialise exporter
sdExporter, err := stackdriver.NewExporter(stackdriver.Options{
    ProjectID: project,
    Location:  zone,
    MetricPrefix: "cis",
    ReportingInterval: time.Minute,
})

err = sdExporter.StartMetricsExporter()
view.RegisterExporter(sdExporter)
view.SetReportingPeriod(time.Minute)

I so not see how that can generate the wrong type but i will test without the view.RegisterExporter call. Reason for this double invocation was that using only the new way was not generating any custom metrics as noted in #242 .

So most likely this issue is caused by trying to fix another issue.

rghetia commented 4 years ago

Reason for this double invocation was that using only the new way was not generating any custom metrics as noted in #242 .

Not sure why LastValue(Gauge) is not work for your with the new approach.

Notice the missing cis prefix in the name of the wrong one.

I found it exactly opposite. prefix is there in gauge type.

approach code Metric type Comment
old view.RegisterExporter(sdExporter) Cumulative
new sdExporter.StartMetricsExporter() Cumulative has prefix
both view.RegisterExporter(sdExporter) and sdExporter.StartMetricsExporter() Cumulative and Gauge The descriptor created using new approach is of type Gauge. Don't know why, yet.

The MetricPrefix is only applicable to the new approach (although doc doesn't say that). I would prefer that you just use the new approach and if there are any issues I can fix them.

eqinox76 commented 4 years ago

Using the new approach seems to be working so far. I will open another ticket when i can reproduce the issue where metricDescriptors are not created using the new approach.

Thanks for your help and time!