elastic / apm-agent-dotnet

https://www.elastic.co/guide/en/apm/agent/dotnet/current/index.html
Apache License 2.0
573 stars 205 forks source link

EF Core statement extraction #2344

Open kanadaj opened 2 months ago

kanadaj commented 2 months ago

Is your feature request related to a problem? Please describe. When using the EF Core diagnostic subscriber implementation with Pomelo (might also apply to SQL Server, I haven't checked), the APM logs SQL queries as "Open" and "Execute" in the waterfall. Clicking the items shows that the APM did in fact log the query, but it's hard to see at a glance what's what.

image

Describe the solution you'd like It would be great if some more informative operation names could be generated from the known statements, eg:

<keyword> <table names>

so it would be along the lines of:

SELECT `Users`, `UserTokens`
UPDATE `Users`
EXEC `SomeStoredProcedure`

Describe alternatives you've considered Using the full query would be too long in general for meaningful use, so table names would be the best. My proposed solution below only grabs the first keyword and most tables in a query. It's enough to distinguish queries in a row reasonably well.

Open statements might also benefit from having the host of the database connection appended in case there are multiple connections, but might be unnecessary.

Additional context

from my initial experimentation this can be adequately handled using some fairly simple regex functions (verified with MySQL, might work with SQL Server as well, but regex might be not the best way to handle this, as convenient as it is):

var queryName = $"{GetOperationName(statement)} {string.Join(", ", GetTableNames(statement))}";

    public string GetOperationName(string? command)
    {
        if(command == null) return "Unknown";

        var firstWord = command.IndexOf(" ", StringComparison.Ordinal) > -1
            ? command.Substring(0, command.IndexOf(" ", StringComparison.Ordinal))
            : command;
        return firstWord.ToUpper();
    }

    private readonly Regex _regexMatchTableName =
        new Regex(
            @"(?<=(?:FROM[\s(`]|JOIN[\s(`]))(?>[\w\.\[\]`]+)(?=[\s)`]*(?:\s+(?:AS\s+)?[\w\d]+)?(?:$|\s+(?:WHERE|AS|ON|(?:(?:LEFT|RIGHT)\s+)?(?:(?:OUTER|INNER)\s+)?JOIN)))",
            RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase);

    public string[] GetTableNames(string? command)
    {
        if(command == null) return Array.Empty<string>();

        var matches = _regexMatchTableName.Matches(command);
        return matches.Select(x => x.Value).Distinct().SkipWhile(x => x == "SELECT").ToArray();
    }
kanadaj commented 2 months ago

Actually, does this even come from the EfCoreDiagnosticListener? I don't have a reference to Elastic.Apm.EntityFrameworkCore (not even an indirect one, just Elastic.Apm.AspNetCore and the related packages) and seems to pick up those traces anyway...

kanadaj commented 2 months ago

Okay so this probably applies to both the default implementation and the OpenTelemetryBridge. I wonder, can we post-process traces from the OTel bridge?