Open CapoD opened 2 months ago
@aabmass just fixed a bundling problem, could possibly be related.
@psx95 has been looking into OTel export from Cloud Run Functions, so might have some insight
We use Rspack to bundle the instrumentation in
instrumentation/main.js
, however the Protocol Buffers in@google-cloud/opentelemetry-cloud-trace-exporter
cause problems while bundling, so we exclude them from the bundle and provide them as runtime dependencies.
Bundling issues should be fixed in #404 which I'm hoping to release this week.
// This was a workaround we found since directly using gcpDetector raised an issue with unresolved promises
batchWriteSpans error: 3 INVALID_ARGUMENT: Invalid project resource name projects/[object Promise]
I don't think the awaitAttributes
wrapper will help. What was the actual error you were seeing about unresolved promises? The OTel Node BatchSpanProcessor should be lazily awaiting resource attributes.
The usual issue we see with Cloud Run and Cloud Run Functions is that CPU gets throttled and breaks background exporting. For Cloud Run we recommend always-allocated CPU but that doesn't seem to be available in Cloud Run functions still.
Is the service experiencing very low QPS or bursty traffic?
As @aabmass mentioned, always-allocated CPU is the recommended approach. For some reason this is not available in "Cloud Run Functions" from Google Cloud Console. However, now there is an option to deploy a function from Cloud Run and if you create a Function from within "Cloud Run" page, you do have the option to use always allocated CPU.
It is a little confusing on the UI side and I imagine there is work going on to reconcile these two options in a single page. Alternatively, you could use this gcloud
command to deploy a function with always-allocated CPU:
# This was used to deploy a Java function, so might need some updates in --source flag for JS
gcloud beta run deploy cloud-func-helloworld \
--no-cpu-throttling \ # this results in always-allocated CPU
--container app-function \
--function org.example.HelloWorld \ # this deploys the service as a cloud run function with a given entry point.
--source=build/libs \
--port=8080
@aabmass:
We use Rspack to bundle the instrumentation in
instrumentation/main.js
, however the Protocol Buffers in@google-cloud/opentelemetry-cloud-trace-exporter
cause problems while bundling, so we exclude them from the bundle and provide them as runtime dependencies.Bundling issues should be fixed in #404 which I'm hoping to release this week.
I think that bundling is not our issue, it is just a little inconvenience, but I thought I should mention it.
// This was a workaround we found since directly using gcpDetector raised an issue with unresolved promises
batchWriteSpans error: 3 INVALID_ARGUMENT: Invalid project resource name projects/[object Promise]
I don't think the
awaitAttributes
wrapper will help. What was the actual error you were seeing about unresolved promises? The OTel Node BatchSpanProcessor should be lazily awaiting resource attributes.
At this point we were trying different methods to resolve this error:
batchWriteSpans error: 3 INVALID_ARGUMENT: Invalid project resource name projects/[object Promise]
Inspired by: https://github.com/open-telemetry/opentelemetry-js/issues/4638
The usual issue we see with Cloud Run and Cloud Run Functions is that CPU gets throttled and breaks background exporting. For Cloud Run we recommend always-allocated CPU but that doesn't seem to be available in Cloud Run functions still.
Is the service experiencing very low QPS or bursty traffic?
We did not deploy the telemetry to our production, only in development environment so the traffic is one request at a time. The function is configured as shown here:
@psx95 :
Re always-allocated CPU
As @aabmass mentioned, always-allocated CPU is the recommended approach. For some reason this is not available in "Cloud Run Functions" from Google Cloud Console. However, now there is an option to deploy a function from Cloud Run and if you create a Function from within "Cloud Run" page, you do have the option to use always allocated CPU.
It is a little confusing on the UI side and I imagine there is work going on to reconcile these two options in a single page. Alternatively, you could use this
gcloud
command to deploy a function with always-allocated CPU:# This was used to deploy a Java function, so might need some updates in --source flag for JS gcloud beta run deploy cloud-func-helloworld \ --no-cpu-throttling \ # this results in always-allocated CPU --container app-function \ --function org.example.HelloWorld \ # this deploys the service as a cloud run function with a given entry point. --source=build/libs \ --port=8080
We are not deploying Cloud Run functions directly, we use firebase (cli). Our configuration takes place as follows:
import { onRequest } from 'firebase-functions/v2/https';
import { setGlobalOptions } from 'firebase-functions/v2/options';
import { app } from './app.server';
setGlobalOptions({ region: 'europe-west3', minInstances: 2, concurrency: 8, memory: '4GiB', cpu: 2 });
export const ssrServer = onRequest(app());
There is no option to configure CPU throttling:
This results in the configuration seen in my previous comment.
@CapoD I see, sorry I missed that.
It turns out there is a feature request for "always-on CPU" for firebase functions gen2 here.
You should signal your interest in the feature request there as well.
OK maybe I know what's going on here regarding
However, the above solution does not work. The function log only writes two lines:
Error: Timeout at Timeout.<anonymous> (/workspace/instrumentation/main.js:3:686268) at b.<computed> (file:///workspace/server/polyfills.server.mjs:4:12215) at s.invokeTask (file:///workspace/server/polyfills.server.mjs:3:7335) at ne.runTask (file:///workspace/server/polyfills.server.mjs:3:2678) at invokeTask (file:///workspace/server/polyfills.server.mjs:3:8410) at E.W.useG.invoke (file:///workspace/server/polyfills.server.mjs:3:8312) at E.args.<computed> [as _onTimeout] (file:///workspace/server/polyfills.server.mjs:4:11932) at listOnTimeout (node:internal/timers:581:17) at process.processTimers (node:internal/timers:519:7)
and then
batchWriteSpans error: 3 INVALID_ARGUMENT: Invalid project resource name projects/[object Promise
I think this is coming from https://github.com/GoogleCloudPlatform/opentelemetry-operations-js/blob/b64dbe74e3628b8c4d0cce412e62bacfb5473931/packages/opentelemetry-cloud-trace-exporter/src/trace.ts#L94 which isn't related to the OTel resource or resource detection at all.
I think the cryptic error log and stack trace you're seeing may be coming from where that promise is initialized (although the bundler without source maps makes it hard to know for sure): https://github.com/GoogleCloudPlatform/opentelemetry-operations-js/blob/b64dbe74e3628b8c4d0cce412e62bacfb5473931/packages/opentelemetry-cloud-trace-exporter/src/trace.ts#L61-L65
However I know this is working in Cloud Functions gen 2 because we have e2e tests for it. I wonder if it's also caused by CPU throttling with low QPS right after startup.
Is your feature request related to a problem? Please describe. We are using Firebase functions (2nd gen) that are deployed as Cloud Run functions in GCP. These functions use Angular Server Side Rendering and create lots of HTTP Requests that we would like to trace. Unfortunately, we are not able to find a working example. Thus, we suggest to add one.
Describe the solution you'd like We believe that our code may be a good starting point for a working example (if we get it to work ;-) ) and this issue might also help others.
Function's
package.json
: We use Rspack to bundle the instrumentation ininstrumentation/main.js
, however the Protocol Buffers in@google-cloud/opentelemetry-cloud-trace-exporter
cause problems while bundling, so we exclude them from the bundle and provide them as runtime dependencies.However, the above solution does not work. The function log only writes two lines:
and then
We also do not see any traces in Cloud Trace except of course the Load Balancer and Function Call trace. If you guys have any ideas on what we are doing wrong, we would love to hear them.
Describe alternatives you've considered A working example might be included in the official GCP docs, as well.