dotnet / aspire

An opinionated, cloud ready stack for building observable, production ready, distributed applications in .NET
https://learn.microsoft.com/dotnet/aspire
MIT License
3.61k stars 400 forks source link

Different instances of same resource type grouped as replicas in dashboard #4103

Closed paulomorgado closed 1 month ago

paulomorgado commented 4 months ago

In 8.0.0-preview.6.24214.1 and 8.0.0-preview.7.24251.11, different instances of project resources are grouped as replicas, even if they are not replicas.

This happens for:

But not for:

kvenkatrajan commented 4 months ago

@JamesNK can you have a look please?

p10tyr commented 4 months ago

I have been having this problem since I started using Aspire Dashboard (Feb 2024) for local development. Thought that was how it's meant to be until I saw a demo elsewhere where the services were grouped nicely.

The service name is passed in. And this works for Elastics APM or Graphana.

ResourceBuilder
          .CreateDefault()
          .AddService("ServiceName")
          .AddAttributes(otelAttributes);

This is what I see in Aspire version: 8.0.0-preview.6.24214.1 and Aspire version: 8.0.0-preview.7.24251.11 image

paulomorgado commented 4 months ago

@p10tyr, are those real replicas? I my case there are no replicas. Just instances of the same project.

Are you using just the dashboard? Or are you using the distributed application host?

p10tyr commented 4 months ago

@p10tyr, are those real replicas? I my case there are no replicas. Just instances of the same project.

Are you using just the dashboard? Or are you using the distributed application host?

I don't know. Most of them are always empty. Just one of the GUID's has data within them. Why would I have this any way if im just running one local instance of each API ?

I'd like to just click on the top-level Replica any way. I can not

clt-pereira commented 3 months ago

The same thing happens to me. aspire

paulomorgado commented 3 months ago

The main issue for me, is that I don't have replicas. I have multiple independent instances of the same resource type.

kvenkatrajan commented 3 months ago

@adamint please have a look - try repro with multiple independent instances of the same resource type

clt-pereira commented 3 months ago

I would just like to know why this happens because in my project I only have 1 WebApi and I am uploading it locally, I couldn't find anywhere what this GUID code and this grouping around my application means.

adamint commented 3 months ago

Hi @paulomorgado @clt-pereira @p10tyr, we apologize for the delay in getting to this issue. Could you confirm that this problem does not appear on the console logs page? I am having trouble reproducing this, do you have a minimal repro solution that you would be able to share?

The main issue for me, is that I don't have replicas. I have multiple independent instances of the same resource type.

@paulomorgado this may just be naming confusion. We should consider renaming this temporarily to (running instances) until replica support is complete.

I don't know. Most of them are always empty. Just one of the GUID's has data within them. Why would I have this any way if im just running one local instance of each API ?

Each individual OTLP application instance shows up under this banner. Can you share how you're creating OTLP applications? If multiple instances of the OTLP application are running, what you are showing is to be expected. @p10tyr you seem to be describing a separate issue where you have one instance of an application running but multiple grouped applications, please also share your Aspire configuration, or if possible a minimal repro.

I'd like to just click on the top-level Replica any way. I can not

Unfortunately, this view is not yet available. @kvenkatrajan

paulomorgado commented 3 months ago

Hi @adamint,

Each individual OTLP application instance shows up under this banner. Can you share how you're creating OTLP applications? If multiple instances of the OTLP application are running, what you are showing is to be expected

How are "OTLP applications" created?

Is it related to this?

builder.Services.AddOpenTelemetry()
    .ConfigureResource(resourceBuilder =>
    {
        resourceBuilder
            .AddService(
                serviceName: builder.Environment.ApplicationName,
                serviceVersion: serviceVersion,
                autoGenerateServiceInstanceId: true);
    });

Can I force it on Aspire to be the declared resource name?

adamint commented 3 months ago

How are "OTLP applications" created?

@paulomorgado OTLP "application" specifically refers here to the service instance id that you are using for the given service name. Can you share the OpenTelemetry package version you're using? There was a bug that would cause multiple instance ids to be created during process runtime when autoGenerateServiceInstanceId is set to true, which would cause the behavior you're seeing - see https://github.com/open-telemetry/opentelemetry-dotnet/discussions/5101

It would also be helpful to note if the workaround noted here: https://github.com/open-telemetry/opentelemetry-dotnet/issues/4871 fixes the issue for you.

To clarify - there is only one process running for the application in question, right?

paulomorgado commented 3 months ago

@adamint, there are several processes for the same application.

Imagine you have a chat application with a central hub to relay the messages. You have 1 hub and several chat clients.

Changing to this, solved it:

builder.Services.AddOpenTelemetry()
    .ConfigureResource(resourceBuilder =>
    {
        resourceBuilder
            .AddService(
                serviceName: Environment.GetEnvironmentVariable("OTEL_SERVICE_NAME") ?? builder.Environment.ApplicationName,
                serviceVersion: serviceVersion,
                autoGenerateServiceInstanceId: Environment.GetEnvironmentVariable("OTEL_RESOURCE_ATTRIBUTES")?.Contains("service.instance.id=") != false);
    })
adamint commented 3 months ago

Glad to hear that your issue is fixed. Could you share the OpenTelemetry package version you are using?

paulomorgado commented 3 months ago

The latest and greatest! 😄

    <PackageVersion Include="Aspire.Hosting.AppHost" Version="8.0.1" />
    <PackageVersion Include="Microsoft.Extensions.ServiceDiscovery" Version="8.0.1" />
    <PackageVersion Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.8.1" />
    <PackageVersion Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" Version="1.8.0-rc.1" />
    <PackageVersion Include="OpenTelemetry.Extensions.Hosting" Version="1.8.1" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.8.1" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.GrpcNetClient" Version="1.8.0-beta.1" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.Http" Version="1.8.1" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.Process" Version="0.5.0-beta.5" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.Runtime" Version="1.8.1" />
clt-pereira commented 3 months ago

In my case, I only have 1 instance of the application, however, I realized that when I added structured log writing from Serilog to OpenTelemetry this was causing my problem. When I informed the parameter autoGenerateServiceInstanceId = false the problem was resolved.

adamint commented 3 months ago

It then appears that there is still an issue with instance id stability. @JamesNK since filed a bug report against otel on this issue, have you encountered it since they released a fix in 1.7?

adamint commented 2 months ago

1411 should be completed at the same time as this. DCP gives AppHost owner information, which we can use to construct accurate replica sets.

rolfik-mycronic commented 1 month ago

Hello, I have the same problem, but coming from different sources. I would like to always use service names and see service instance only as extended information (tooltip etc.). image

We use OTEL Collector to collect telemetry from our microservices run by docker compose in production. The collector then distributes received telemetry to extra export files, Grafana (loki/tempo/mimir) and standalone Aspire Dashboard.

In order to analyze an application problem offline, we create a Saved Status - composite zip archive with database dump, console and file logs of some external services and telemetry file exports from the collector.

When restoring Saved Status on another machine to analyze it on another application instance, we process telemetry files, add extra properties to telemetry records to easily find them, extend Aspire Dashboard limits and Grafana retention periods to fit the imported telemetry data, put the processed telemetry files to Collector`s import folder. The collector then imports it in background to Aspire Dashboard and Grafana playing it back like in production.

Because telemetry data includes many application runs, we have many service instance identifiers, but what really matters is filtering by service name and Saved Status keys. image Service instance ids in Structured Logs, Traces and Metrics combo box just clutter and complicate usage.

I will add the problem and others related blocking proper usage to the summary list for our historical telemetry analysis case:

We have added Aspire Dashboard for simpler telemetry analysis then in Grafana. We have custom dashboards in Grafana also, but doing advanced analysis requires advances query language (*QL) skills which is too much for many people including service men. Aspire Dashboard is promising for our case already now and when extended to address the above issues, it will be even better.

Thank You for the tool :-)

adamint commented 1 month ago

@kvenkatrajan @JamesNK ^

JamesNK commented 1 month ago

This is improved in the next Aspire version.

Just seeing a GUID service instance id isn't a good experience. When there are multiple instances of telemetry for a service, name now combines the service name with the first characters of the service instance id.

For example, if there are multiple instances of lineconfiguration, you'll see:

I think that addresses the problem here.

rolfik-mycronic commented 1 month ago

I see Aspire 8.1 is out, but I do not see Aspire Dashboard standalone container image for that version. 8.0.2 still has the problems.

JamesNK commented 1 month ago

@joperezr When will 8.1 of the dashboard be published?

joperezr commented 1 month ago

After https://github.com/dotnet/dotnet-docker/pull/5732 gets merged it shouldn't be long before we have the image available. We already have a nightly image available: https://mcr.microsoft.com/en-us/product/dotnet/nightly/aspire-dashboard/about

rolfik-mycronic commented 1 month ago

I have tested Aspire Dashboard 8.1.

adamint commented 1 month ago

have tested Aspire Dashboard 8.1.

JamesNK commented 1 month ago

I cannot simply select app name to filter all app instances with the name, which I want, but I am forced to select specific instance which I do not need

This isn't supported in 8.1. I created an issue for supporting this in the future: https://github.com/dotnet/aspire/issues/5137