cap-js / telemetry

CDS plugin providing observability features, incl. automatic OpenTelemetry instrumentation.
https://cap.cloud.sap/docs
Apache License 2.0
9 stars 7 forks source link

RESOURCE_EXHAUSTED: Received message larger than max (1752460652 vs 4194304) #240

Closed wozjac closed 2 weeks ago

wozjac commented 1 month ago

Hi,

we receive such error in the log:

{"stack":"Error: 8 RESOURCE_EXHAUSTED: Received message larger than max (1752460652 vs 4194304)\n at callErrorFromStatus (/home/vcap/deps/0/node_modules/@grpc/grpc-js/build/src/call.js:31:19)\n at Object.onReceiveStatus (/home/vcap/deps/0/node_modules/@grpc/grpc-js/build/src/client.js:193:76)\n at Object.onReceiveStatus (/home/vcap/deps/0/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)\n at Object.onReceiveStatus (/home/vcap/deps/0/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)\n at /home/vcap/deps/0/node_modules/@grpc/grpc-js/build/src/resolving-call.js:129:78\n at process.processTicksAndRejections (node:internal/process/task_queues:77:11)\nfor call at\n at ServiceClientImpl.makeUnaryRequest (/home/vcap/deps/0/node_modules/@grpc/grpc-js/build/src/client.js:161:32)\n at ServiceClientImpl.export (/home/vcap/deps/0/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)\n at /home/vcap/deps/0/node_modules/@opentelemetry/otlp-grpc-exporter-base/build/src/grpc-exporter-transport.js:98:32\n at new Promise ()\n at GrpcExporterTransport.send (/home/vcap/deps/0/node_modules/@opentelemetry/otlp-grpc-exporter-base/build/src/grpc-exporter-transport.js:87:16)\n at OTLPTraceExporter.send (/home/vcap/deps/0/node_modules/@opentelemetry/otlp-grpc-exporter-base/build/src/OTLPGRPCExporterNodeBase.js:87:14)\n at /home/vcap/deps/0/node_modules/@opentelemetry/otlp-exporter-base/build/src/OTLPExporterBase.js:77:22\n at new Promise ()\n at OTLPTraceExporter._export (/home/vcap/deps/0/node_modules/@opentelemetry/otlp-exporter-base/build/src/OTLPExporterBase.js:74:16)\n at OTLPTraceExporter.export (/home/vcap/deps/0/node_modules/@opentelemetry/otlp-exporter-base/build/src/OTLPExporterBase.js:65:14)","message":"8 RESOURCE_EXHAUSTED: Received message larger than max (1752460652 vs 4194304)","code":"8","details":"Received message larger than max (1752460652 vs 4194304)","metadata":"[object Object]","name":"Error"}

The configuration is:

 "[production]": {
      "telemetry": {
        "kind": "to-cloud-logging"
      }
    },

We don't use any custom metrics, just the default setup. What is interesting, that this happens only in 2 out from 3 of our subaccounts.

How to track what might be the cause?

Best Regars, Jacek

sjvans commented 1 month ago

hi @wozjac

thanks for reporting. i've experienced this as well and am currently in the process of clarifying the issue together with the colleagues from cloud logging.

best, sebastian

sjvans commented 1 month ago

hi @wozjac

i was able to resolve my issue. however, @cap-js/telemetry was not involved. i was instrumenting @sap/approuter via @opentelemetry/auto-instrumentations-node and had a credentials handling issue, specifically setting them via env vars. this same issue can not occur with @cap-js/telemetry as env vars are not used... hence, i'd need steps how to reproduce your case.

best, sebastian

wozjac commented 1 month ago

Hi @sjvans

thanks for checking, we had to disable the plugin, as all logs are covered with this message. Is there any switch we can use to track what/why causes so big data input?

Best Regards Jacek

sjvans commented 1 month ago

hi @wozjac

it shouldn't be the metrics but the traces. cf. OTLPTraceExporter in the stack trace. you could verify by running with cds.requires.telemetry.tracing: null, which disables tracing but metrics are still active.

which version of grpc-js are you using? there is this issue report with >= 1.10.9: https://github.com/grpc/grpc-node/issues/2822

best, sebastian

juergen-walter commented 1 month ago

To me this looks like a client library issue which is independent of SAP Cloud Logging. It just happens when sending is configured to any destination. I am not even sure if the request is actually tried or if it breaks before. Even if everything would be working as designed on CAP side, 4 megabyte is a common upper limit for single requests which we would not change for SAP Cloud Logging.

Good luck in fixing the issue. Best, Jürgen

qby-ankul commented 2 weeks ago

Is there already a solution to the problem, apart from deactivating tracing?

{"stack":"Error: PeriodicExportingMetricReader: metrics export failed (error Error: 8 RESOURCE_EXHAUSTED: Received message larger than max (1752460652 vs 4194304))\n at doExport (/home/user/projects/node_modules/@opentelemetry/sdk-metrics/src/export/PeriodicExportingMetricReader.ts:133:15)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at PeriodicExportingMetricReader._doRun (/home/user/projects/node_modules/@opentelemetry/sdk-metrics/src/export/PeriodicExportingMetricReader.ts:148:7)\n at PeriodicExportingMetricReader._runOnce (/home/user/projects/node_modules/@opentelemetry/sdk-metrics/src/export/PeriodicExportingMetricReader.ts:104:7)","message":"PeriodicExportingMetricReader: metrics export failed (error Error: 8 RESOURCE_EXHAUSTED: Received message larger than max (1752460652 vs 4194304))","name":"Error"}

qby-ankul commented 2 weeks ago

cds.requires.telemetry.tracing: null

I have just tried it once with this parameter, we still get the same error message. With 'none' the metrics are displayed in the console.

sjvans commented 2 weeks ago

cds.requires.telemetry.tracing: null

I have just tried it once with this parameter, we still get the same error message. With 'none' the metrics are displayed in the console.

your error originates in metrics, not tracing. hence, cds.requires.telemetry.tracing: null has no effect. you'd need to do cds.requires.telemetry.metrics: null.

in any case, i'm afraid there's not really anything we can do on the client side. the cloud logging colleagues will double check on their end.

best, sebastian

qby-ankul commented 2 weeks ago

Ok, the error is gone for now, but it is not a final solution. Would an SAP case help?

sjvans commented 1 week ago

hi @qby-ankul

the colleagues are looking into it, but you can create a service now issue for component BC-CP-CLS if you like.

best, sebastian

qby-ankul commented 1 week ago

Hi @sjvans,

Thank you very much, I have already opened a case: 1157673/2024

prophet1906 commented 1 week ago

I am also facing the same issue.

juergen-walter commented 3 days ago

If you are affected, I recommend to check if the certificate for SAP Cloud Logging is still valid. I assume this is the root cause for this hard to understand error message from CAP.

qby-ankul commented 3 days ago

Ok, found ... the ingest-otlp-cert is expired ... let me check it.

qby-ankul commented 3 days ago

Hi @juergen-walter i have now only verified it in dev, where it seems to be working again. Does the cls not regenerate the shared instance certificates? What is the best way for service keys, is there some tool or guide?

image