Open deweyjose opened 2 years ago
Thanks for opening this issue! This could be tied to #2046
I'm seeing a similar issue, but may not be related:
{"timestamp":"2022-11-08T19:24:19.723074Z","level":"ERROR","fields":{"message":"OpenTelemetry trace error occurred: oneshot canceled"},"target":"apollo_router::plugins::telemetry"}
Running v1.2.1
Not sure about the oneshot, but the other issue is likely to be caused by the BatchSpanProcessor defaults not being good for production.
In the short term the following env variables are available to tweak: https://opentelemetry.io/docs/reference/specification/sdk-environment-variables/#batch-span-processor
Longer term we are looking to add this configuration to the router.yaml
Going to close this for now in favor of #1789 as I think this is resolved by tweaking the batch span processor settings. If it still persists then feel free to reopen.
This issue is till occurring in v1.37.0 (#1789 appears to be a different issue; notice the error message is different).
Here are the relevant error messages:
OpenTelemetry trace error occurred: error sending request for url (http://localhost:8126/v0.5/traces): connection closed before message completed
OpenTelemetry trace error occurred: error sending request for url (http://localhost:8126/v0.5/traces): connection error: Connection reset by peer (os error 104)
Here's my telemetry config (notice the errors are related to the datadog tracer):
telemetry:
exporters:
metrics:
common:
service_name: ${env.DD_SERVICE:-graphql-router}
prometheus:
enabled: true
listen: 0.0.0.0:9090
path: /metrics
tracing:
common:
service_name: ${env.DD_SERVICE:-graphql-router}
datadog:
enabled: true
endpoint: ${env.DD_ENDPOINT}
enable_span_mapping: true
Could be related to https://github.com/open-telemetry/opentelemetry-rust-contrib/issues/7
reopening as it is apparently still not resolved and we are getting more reports of it. Apparently the Datadog agent is correctly receiving the trace, but abruptly closing the connection. Other libraries have workarounds for it: https://github.com/will-bank/datadog-tracing/blob/30cdfba8d00caa04f6ac8e304f76403a5eb97129/src/tracer.rs#L29
I had a go at this and although better it still produced occasional errors in the logs.
Describe the bug We're seeing the following error in our Router logs: