GoogleCloudPlatform / opentelemetry-operations-js

This repository is home to Google Cloud Exporters (Trace and Monitoring) for OpenTelemetry Node.js Project (https://github.com/open-telemetry/opentelemetry-js)
Apache License 2.0
67 stars 63 forks source link

Document usage in Cloud Run functions #750

Open CapoD opened 1 week ago

CapoD commented 1 week ago

Is your feature request related to a problem? Please describe. We are using Firebase functions (2nd gen) that are deployed as Cloud Run functions in GCP. These functions use Angular Server Side Rendering and create lots of HTTP Requests that we would like to trace. Unfortunately, we are not able to find a working example. Thus, we suggest to add one.

Describe the solution you'd like We believe that our code may be a good starting point for a working example (if we get it to work ;-) ) and this issue might also help others.

// compiled to: instrumentation/main.js

import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';
import { core, NodeSDK } from '@opentelemetry/sdk-node';
import { gcpDetector } from '@opentelemetry/resource-detector-gcp';
import { TraceExporter } from '@google-cloud/opentelemetry-cloud-trace-exporter';
import { diag, DiagConsoleLogger, DiagLogLevel, trace } from '@opentelemetry/api';
import { Detector, DetectorSync, IResource, ResourceDetectionConfig } from '@opentelemetry/resources';
import { AlwaysOnSampler } from '@opentelemetry/sdk-trace-base';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-node';

// This was a workaround we found since directly using gcpDetector raised an issue with unresolved promises
function awaitAttributes(detector: DetectorSync): DetectorSync | Detector {
  return {
    async detect(config?: ResourceDetectionConfig): Promise<IResource> {
      const resource = detector.detect(config);
      await resource.waitForAsyncAttributes?.();
      console.log('Detected resource: ', resource);
      return resource;
    },
  };
}

// We tried it with and w/o options, but that did not help
const traceExporterOptions = { projectId: '****', resourceFilter: /^service\./ };
const sdk = new NodeSDK({
  sampler: new AlwaysOnSampler(),
  resourceDetectors: [awaitAttributes(gcpDetector)],
  // That seemed to be the right place 
  traceExporter: new TraceExporter(traceExporterOptions),
  // Adding it here allowed us to wrap it with the BatchSpanProcess 
  spanProcessors: [new BatchSpanProcessor(new TraceExporter(traceExporterOptions))],
  instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()],
});

diag.setLogger(new DiagConsoleLogger(), core.getEnv().OTEL_LOG_LEVEL || DiagLogLevel.VERBOSE);

try {
  sdk.start();
  diag.info('OpenTelemetry instrumentation started successfully');
} catch (error) {
  diag.error(
    'Error initializing OpenTelemetry SDK. Your application is not instrumented and will not produce telemetry',
    error,
  );
}

/*
  Do something
 */
const tracer = trace.getTracer('test-test', '0.1.0');
const span = tracer.startSpan('Test');
span.end();

process.on('SIGTERM', () => {
  sdk
    .shutdown()
    .then(() => diag.debug('OpenTelemetry SDK terminated'))
    .catch((error) => diag.error('Error terminating OpenTelemetry SDK', error));
});
# .env
NODE_OPTIONS='--require ./instrumentation/main.js'

Function's package.json: We use Rspack to bundle the instrumentation in instrumentation/main.js, however the Protocol Buffers in @google-cloud/opentelemetry-cloud-trace-exporter cause problems while bundling, so we exclude them from the bundle and provide them as runtime dependencies.

{
  "name": "ssr",
  "engines": {
    "node": "20"
  },
  "main": "server/server.mjs",
  "dependencies": {
    "@google-cloud/opentelemetry-cloud-trace-exporter": "*"
  }
}

However, the above solution does not work. The function log only writes two lines:

Error: Timeout
    at Timeout.<anonymous> (/workspace/instrumentation/main.js:3:686268)
    at b.<computed> (file:///workspace/server/polyfills.server.mjs:4:12215)
    at s.invokeTask (file:///workspace/server/polyfills.server.mjs:3:7335)
    at ne.runTask (file:///workspace/server/polyfills.server.mjs:3:2678)
    at invokeTask (file:///workspace/server/polyfills.server.mjs:3:8410)
    at E.W.useG.invoke (file:///workspace/server/polyfills.server.mjs:3:8312)
    at E.args.<computed> [as _onTimeout] (file:///workspace/server/polyfills.server.mjs:4:11932)
    at listOnTimeout (node:internal/timers:581:17)
    at process.processTimers (node:internal/timers:519:7)

and then

batchWriteSpans error: 3 INVALID_ARGUMENT: Invalid project resource name projects/[object Promise]

We also do not see any traces in Cloud Trace except of course the Load Balancer and Function Call trace. If you guys have any ideas on what we are doing wrong, we would love to hear them.

Describe alternatives you've considered A working example might be included in the official GCP docs, as well.

dashpole commented 3 days ago

@aabmass just fixed a bundling problem, could possibly be related.

@psx95 has been looking into OTel export from Cloud Run Functions, so might have some insight

aabmass commented 2 days ago

We use Rspack to bundle the instrumentation in instrumentation/main.js, however the Protocol Buffers in @google-cloud/opentelemetry-cloud-trace-exporter cause problems while bundling, so we exclude them from the bundle and provide them as runtime dependencies.

Bundling issues should be fixed in #404 which I'm hoping to release this week.

// This was a workaround we found since directly using gcpDetector raised an issue with unresolved promises
batchWriteSpans error: 3 INVALID_ARGUMENT: Invalid project resource name projects/[object Promise]

I don't think the awaitAttributes wrapper will help. What was the actual error you were seeing about unresolved promises? The OTel Node BatchSpanProcessor should be lazily awaiting resource attributes.

The usual issue we see with Cloud Run and Cloud Run Functions is that CPU gets throttled and breaks background exporting. For Cloud Run we recommend always-allocated CPU but that doesn't seem to be available in Cloud Run functions still.

Is the service experiencing very low QPS or bursty traffic?

psx95 commented 2 days ago

Re always-allocated CPU

As @aabmass mentioned, always-allocated CPU is the recommended approach. For some reason this is not available in "Cloud Run Functions" from Google Cloud Console. However, now there is an option to deploy a function from Cloud Run and if you create a Function from within "Cloud Run" page, you do have the option to use always allocated CPU.

It is a little confusing on the UI side and I imagine there is work going on to reconcile these two options in a single page. Alternatively, you could use this gcloud command to deploy a function with always-allocated CPU:

# This was used to deploy a Java function, so might need some updates in --source flag for JS
gcloud beta run deploy cloud-func-helloworld \
 --no-cpu-throttling \ # this results in always-allocated CPU
 --container app-function \
 --function org.example.HelloWorld \ # this deploys the service as a cloud run function with a given entry point.
 --source=build/libs \
 --port=8080