dotnet / aspire

Tools, templates, and packages to accelerate building observable, production-ready apps
https://learn.microsoft.com/dotnet/aspire
MIT License
3.89k stars 469 forks source link

OTLP endpoint may become unresponsive #5072

Open rolfik-mycronic opened 3 months ago

rolfik-mycronic commented 3 months ago

Describe the bug

I import older telemetry (traces/logs/metrics) via OTEL Collector to standalone Aspire Dashboard. Aspire telemetry limits are computed before import and set to appropriate environment variables and Aspire Dashboard container is restarted. Sometimes Aspire OTLP endpoint may become unresponsive and traces/logs UI as well.

Maybe there are too many telemetry records to be ingested by Aspire or too big record batches...

OTEL Collector`s log shows:

2024-07-25 11:24:44 2024-07-25T09:24:44.520Z    info    exporterhelper/retry_sender.go:118      Exporting failed. Will retry the request after interval.        {"kind": "exporter", "data_type": "traces", "name": "otlp/aspire", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "35.536333315s"}
2024-07-25 11:25:24 2024-07-25T09:25:24.992Z    info    exporterhelper/retry_sender.go:118      Exporting failed. Will retry the request after interval.        {"kind": "exporter", "data_type": "traces", "name": "otlp/aspire", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "35.153775693s"}
2024-07-25 11:26:04 2024-07-25T09:26:04.946Z    info    exporterhelper/retry_sender.go:118      Exporting failed. Will retry the request after interval.        {"kind": "exporter", "data_type": "traces", "name": "otlp/aspire", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "25.6975388s"}
2024-07-25 11:26:35 2024-07-25T09:26:35.567Z    error   exporterhelper/queue_sender.go:101      Exporting failed. Dropping data.        {"kind": "exporter", "data_type": "traces", "name": "otlp/aspire", "error": "no more retries left: rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 5522}

Expected Behavior

I would like to see all traces/logs in Aspire and not being dropped due to unresponsiveness.

Steps To Reproduce

Import telemetry data via OTEL Collector`s file import and export it to Aspire Dashboard OTLP endpoint in the same OTEL Collector pipeline.

Exceptions (if any)

No problem can be seen in Aspire Dashboard container log, just:

2024-07-25 12:03:33 info: Aspire.Dashboard.DashboardWebApplication[0]
2024-07-25 12:03:33       Aspire version: 8.0.0+7d0dde4108a2640ded4f9787fe28ce0f12d83633
2024-07-25 12:03:34 info: Aspire.Dashboard.DashboardWebApplication[0]
2024-07-25 12:03:34       Now listening on: http://0.0.0.0:18888
2024-07-25 12:03:34 info: Aspire.Dashboard.DashboardWebApplication[0]
2024-07-25 12:03:34       OTLP server running at: http://aspire-dashboard:4317

Container

leslierichardson95 commented 2 months ago

@rolfik-mycronic Hi Marek, is it possible to share a repo that we can use to try and investigate this issue further?

CC: @samsp-msft

rolfik-mycronic commented 2 months ago

@rolfik-mycronic Hi Marek, is it possible to share a repo that we can use to try and investigate this issue further?

Unfortunately, I cannot share sensitive data for reproduction.

But I have not seen the problem in Aspire 8.1 yet, so You can close it for now. If I will find it later, I will open another one.

Thank You