Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
737 stars 492 forks source link

Event Source writes on 404/0 #4735

Open joegoldman2 opened 3 days ago

joegoldman2 commented 3 days ago

Describe the bug

The Event Source Azure-Cosmos-Operation-Request-Diagnostics is emitting an event on 404/0 while it should not according to the doc:

https://github.com/Azure/azure-cosmos-dotnet-v3/blob/9b491f2db7a44f70f06f1f23015718f0a3833e48/docs/observability.md?plain=1#L10

With the use of the Application Insights SDK that automatically subscribes to the source and produces logs, it is very noisy.

To Reproduce

using System.Diagnostics.Tracing;
using Microsoft.Azure.Cosmos;
using Newtonsoft.Json;

AppContext.SetSwitch("Azure.Experimental.EnableActivitySource", true);

_ = new StdoutEventListener("Azure-Cosmos-Operation-Request-Diagnostics");
_ = new StdoutEventListener("Microsoft-Diagnostics-DiagnosticSource");

var connectionString = "";
using var client = new CosmosClient(connectionString);
var container = client.GetContainer("ToDoList", "Items");

try
{
    var item = await container.ReadItemAsync<Item>("123", new PartitionKey("123"));
}
catch
{
}

public class Item
{
    [JsonProperty("id")]
    public string Id { get; set; } = default!;
}

public sealed class StdoutEventListener : EventListener
{
    private readonly string _eventSourceName;

    public StdoutEventListener(string eventSourceName)
    {
        _eventSourceName = eventSourceName;

        EventSourceCreated += OnEventSourceCreated;
        EventWritten += OnEventWritten;
    }

    private void OnEventSourceCreated(object? sender, EventSourceCreatedEventArgs e)
    {
        if (e.EventSource is null)
        {
            return;
        }

        if (e.EventSource.Name == _eventSourceName)
        {
            EnableEvents(e.EventSource, EventLevel.Verbose);
        }
    }

    private void OnEventWritten(object? sender, EventWrittenEventArgs e)
    {
        ArgumentNullException.ThrowIfNull(e);

        if (e.EventSource.Name == _eventSourceName)
        {
            Console.WriteLine($"[{e.EventSource.Name}] [{e.EventName}] {e.Message} {string.Join(' ', e.Payload!)}");
        }
    }
}

Expected behavior

No event written on the source in this case (assuming that the item with id 123 doesn't exist).

Actual behavior

In the console:

[Azure-Cosmos-Operation-Request-Diagnostics] [Exception] {"Summary":{"DirectCalls":{"(404, 0)":1},"GatewayCalls":{"(200, 0)":3,"(304, 0)":1}},"name":"ReadItemAsync","start datetime":"2024-09-29T06:12:12.675Z","duration in milliseconds":1415.6545,"data":{"Client Configuration":{"Client Created Time Utc":"2024-09-29T06:12:12.6573690Z","MachineId":"hashedMachineName:3e6f6c33-54e4-43f6-9ea9-a4138bc66fb3","NumberOfClientsCreated":1,"NumberOfActiveClients":1,"ConnectionMode":"Direct","User Agent":"cosmos-netstandard-sdk/3.44.0P|1|X64|Microsoft Windows 10.0.22631|.NET 9.0.0-rc.1.24431.7|N|","ConnectionConfig":{"gw":"(cps:50, urto:6, p:False, httpf: False)","rntbd":"(cto: 5, icto: -1, mrpc: 30, mcpe: 65535, erd: True, pr: ReuseUnicastPort)","other":"(ed:False, be:False)"},"ConsistencyConfig":"(consistency: NotSet, prgns:[], apprgn: )","ProcessorCount":16},"DistributedTraceId":"af4ec2fd532e4dee9ba569a1ad2ce5b5"},"children":[{"name":"Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler","duration in milliseconds":1387.3859,"children":[{"name":"Waiting for Initialization of client to complete","duration in milliseconds":362.6164},{"name":"Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler","duration in milliseconds":971.3398,"data":{"System Info":{"systemHistory":[{"dateUtc":"2024-09-29T06:12:13.0350736Z","cpu":0.484,"memory":19488168.000,"threadInfo":{"isThreadStarving":"no info","availableThreads":32765,"minThreads":16,"maxThreads":32767},"numberOfOpenTcpConnection":0}]}},"children":[{"name":"Microsoft.Azure.Cosmos.Handlers.TelemetryHandler","duration in milliseconds":970.8258,"children":[{"name":"Microsoft.Azure.Cosmos.Handlers.RetryHandler","duration in milliseconds":970.1333,"children":[{"name":"Microsoft.Azure.Cosmos.Handlers.RouterHandler","duration in milliseconds":966.2714,"children":[{"name":"Microsoft.Azure.Cosmos.Handlers.TransportHandler","duration in milliseconds":965.7333,"children":[{"name":"Microsoft.Azure.Documents.ServerStoreModel Transport Request","duration in milliseconds":960.4684,"data":{"Client Side Request Stats":{"Id":"AggregatedClientSideRequestStatistics","ContactedReplicas":[{"Count":1,"Uri":"rntbd://cdb-ms-prod-waesteurope1-be244.documents.azure.com:14021/apps/e8029441-f6e9-46e2-b07a-664a8d845bca/services/ed6ca16a-9739-4b8f-8c25-980f79ffe360/partitions/db3a5167-4403-4a74-8be7-b0277316ece9/replicas/123566312563838264s/"}],"RegionsContacted":[],"FailedReplicas":[],"AddressResolutionStatistics":[{"StartTimeUTC":"2024-09-29T06:12:13.6090127Z","EndTimeUTC":"2024-09-29T06:12:13.6503820Z" // rest omitted from brevity

Environment summary

SDK Version: 3.44.0-preview.1

joegoldman2 commented 3 days ago

It is probably because an exception is thrown in this case, but maybe this behavior needs to be reviewed.

kirankumarkolli commented 2 days ago

404 is excluded. In this specific case its the latency threshold which got it traced.

The operation took 1.4 seconds and default point operation latency threshold is 1second. It can be configured through CosmosClientTelemetryOptions.CosmosThresholdOptions.PointOperationLatencyThreshold

joegoldman2 commented 1 day ago

I noticed he same behavior for point reads which last less than a second.

kirankumarkolli commented 1 day ago

@sourabh1007 can you please take a look?