census-instrumentation / opencensus-java

A stats collection and distributed tracing framework
https://opencensus.io
Apache License 2.0
672 stars 201 forks source link

App Engine / Google Cloud Monitoring metrics support issues #2070

Open matthewblain opened 3 years ago

matthewblain commented 3 years ago

Please answer these questions before submitting a bug report.

What version of OpenCensus are you using?

0.28.1

What JVM are you using (java -version)?

Whatever App Engine is using. The latest release notes from Google say "Updated Java SDK to version 1.9.84."

What did you do?

Used OpenCensus to create metrics for Google Cloud Monitoring in a Java8 App Engine Standard app. I am only using OpenCensus (at least for now) to push metrics using a View, Measure Map.

It appears to be working, once I spent a few hours tweaking various settings. This bug is sort of a laundry list of small issues. I am going to mix actual/expected here, realizing that some of these may be best forked off into their own issues and others addressed all at once.

RPCs worked.

Actual: Data flowed just fine from App Engine to Google Cloud Monitoring. Expected: The documentation at https://opencensus.io/integrations/google_cloud/google_cloud_appengine_standard/ says it would not work at all due to GRPC issues. This appears not to be the case. Workaround: Ignore documentation.

Labels were incomplete/insufficient

Actual: Only label was opencensus_task with value java-1@localhost Expected: Some sort of per-instance label. Simplest if the random number were more random. Best if it were to use something App Engine specific (see next section). Workaround; Use setConstantLabels to set a variety of labels. I also added opencensus_task with value java@$instance_id, which would be sufficient. I cannot quite tell if this is necessary/useful, or I should remove opencensus_task as there are now good labels.

I am using the following labels:

("module_id", "App Engine Module ID"): modulesService.getCurrentModule()
("version_id", "App Engine Version ID"): modulesService.getCurrentVersion()
("instance_id", "App Engine Instance ID"): modulesService.getCurrentInstanceId()

Resource type shows up as GCE VM

Actual: Resource type shows up as gce_instance. Various metadata is also blank Expected: Resource type shows up as gae_instance. Metadata shows up using App Engine values. Perhaps this needs a contrib module, or perhaps it can easily be read through the os Environment variables and other system properties. Workaround: Not sure yet. I imagine this can be solved by using StackdriverStatsConfiguration.setMonitoredResource .

Uncertain if there's any other concerns.

Actual: Seems to work Expected: Will continue to work. But I'm concerned there may be some gotcha. For example, losing data with an inappropriate exportInterval. (The defaults should generally be good here.).

Additional context

Simply documenting all of this, which I've started in the 'workarounds' above, may be sufficient. Alternately, guidance as to how to use OpenTelemetry instead.

jsuereth commented 3 years ago

Hey, since you didn't get any activity, just wanted to say thank you much for the feedback!

Yes sending metrics from AppEngine to GCM does seem to work. There's subtle bugs waiting in the weeds which is one reason we don't recommend it yet. Most of them revolve around data loss on eviction, and offering some AppEngine-specific setup to help ensure the defaults you use work in the environment.

Regarding a lot of your concerns: