dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.55k stars 10.05k forks source link

Make fields in logs of Microsoft.AspNetCore.Hosting consistent #58160

Open erikbozic opened 1 month ago

erikbozic commented 1 month ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe the problem.

We are using structured logging (using NLog to output it to json) and shipping the logs to elastic, which indexes it and makes it possible to search through.

However, when elastic tries to index fields it has problems because of miss-matches of types on the same field.

Example of two logs coming from Microsoft.AspNetCore.Hosting.Diagnostics, one at the start of the request and the other at the end.
They have some of the same fields, however with different format of values (string vs. object).

Start of the request:

{
    "time": "2024-09-30 12:05:09.6602",
    "universal-time": "2024-09-30 12:05:09.6602",
    "host": "our-service-86dc4b9fb-d4qnj",
    "application": "Our.Service",
    "threadId": "41",
    "level": "INFO",
    "logger": "Microsoft.AspNetCore.Hosting.Diagnostics",
    "message": "Request starting HTTP/1.1 GET http://10.225.45.99:8080/metrics - - -",
    "message-template": "Request starting {Protocol} {Method} {Scheme}://{Host}{PathBase}{Path}{QueryString} - {ContentType} {ContentLength}",
    "aspnet-TraceIdentifier": "00-7a85866bababb599a104c7c6e5561d82-0f1e9948aa2cf78f-00",
    "activity-SpanId": "0f1e9948aa2cf78f",
    "activity-TraceId": "7a85866bababb599a104c7c6e5561d82",
    "activity-OperationName": "Microsoft.AspNetCore.Hosting.HttpRequestIn",
    "Protocol": "HTTP/1.1",
    "Method": "GET",
    "ContentType": null,
    "ContentLength": null,
    "Scheme": "http",
    "Host": "10.225.45.99:8080",
    "PathBase": "",
    "Path": "/metrics",
    "QueryString": "",
    "EventId": 1
}

end of the request:

{
    "time": "2024-09-30 12:05:09.6602",
    "universal-time": "2024-09-30 12:05:09.6602",
    "host": "our-service-86dc4b9fb-d4qnj",
    "application": "Our.Service",
    "threadId": "41",
    "level": "INFO",
    "logger": "Microsoft.AspNetCore.Hosting.Diagnostics",
    "message": "Request reached the end of the middleware pipeline without being handled by application code. Request path: GET http://10.225.45.99:8080/metrics, Response status code: 404",
    "message-template": "Request reached the end of the middleware pipeline without being handled by application code. Request path: {Method} {Scheme}://{Host}{PathBase}{Path}, Response status code: {StatusCode}",
    "aspnet-TraceIdentifier": "00-7a85866bababb599a104c7c6e5561d82-0f1e9948aa2cf78f-00",
    "activity-SpanId": "0f1e9948aa2cf78f",
    "activity-TraceId": "7a85866bababb599a104c7c6e5561d82",
    "activity-OperationName": "Microsoft.AspNetCore.Hosting.HttpRequestIn",
    "Method": "GET",
    "Scheme": "http",
    "Host": {
        "Value": "10.225.45.99:8080",
        "HasValue": true,
        "Host": "10.225.45.99",
        "Port": 8080
    },
    "PathBase": {
        "Value": "",
        "HasValue": false
    },
    "Path": {
        "Value": "/metrics",
        "HasValue": true
    },
    "StatusCode": 404,
    "EventId": 16
}

Notice the Host, PathBase and Path fields.

I'm unsure how many other cases of this there are - this is just what we found.

Starting request comes from here: https://github.com/dotnet/aspnetcore/blob/30ef19c7d9651d0fa113dd8ed08809e545141d74/src/Hosting/Hosting/src/Internal/HostingRequestStartingLog.cs#L30

Where as request end comes from here: https://github.com/dotnet/aspnetcore/blob/30ef19c7d9651d0fa113dd8ed08809e545141d74/src/Hosting/Hosting/src/Internal/HostingRequestUnhandledLog.cs#L25

Describe the solution you'd like

I think it should be considered if these fields can be made consistent. I believe the situation I described above isn't particularly unique to us.
It would make some headaches go away while using .NET for structured logging.

Some things we have considered:

There are some apparent workarounds for our case. Specificaly this (on the elastic side): https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-malformed.html
However it doesn't actually help if the value is an object instead of a simple type, as stated here:

You also can’t use ignore_malformed to ignore JSON objects submitted to fields of the wrong data type. A JSON object is any data surrounded by curly brackets "{}" and includes data mapped to the nested, object, and range data types.

Another option is that we could ignore those fields via our NLog configuration. While doable and would work, it seems like a game of whack-a-mole. We would need to go and add new problematic fields to excludeProperties (a parameter in NLog JsonLayout: https://github.com/NLog/NLog/wiki/JsonLayout#parameters). Also the fields would now not be visible in the logs at all.

We could also exclude the problematic fields from indexing via elastic index config. Again playing whack-a-mole.

To be honest, perhaps we need a solution to deal with such inconsistencies on the application/infra side anyway. However the framework producing logs with consistent field values would help the situation.

Additional context

No response

amcasey commented 1 month ago

While it's possible that giving the same property different types in different events serves a functional purpose, it seems more likely that it was simply an oversight. For example, it seems quite possible that non-nullable strings were passed to the start event and nullable strings were passed to the end event and the object you're seeing is the default representation of nullable types.

Unfortunately, there are probably consumers that have grown accustomed to working around the difference and who would be broken if it were made consistent. To be safe, we'd have to introduce new properties with consistent types. I won't rule out the possibility, but I think we'd need to see more customer feedback before adopting that approach because of the potential confusion caused by having redundant properties on the event.

erikbozic commented 1 month ago

it seems quite possible that non-nullable strings were passed to the start event and nullable strings were passed to the end event

That was my assumption as well.

To be safe, we'd have to introduce new properties with consistent types

This doesn't really solve our problem. If the properties with the same name and different value still exist in the the log, we will still have the same problem. (we would have to ignore them or special-case them somehow - just like we have to do now)

Since you consider this too much of a breaking change, we will pursue workarounds on the application/infra layer.
Thank you for the response!

pinkfloydx33 commented 1 month ago

I don't think it's about nullability. Those look like PathString/HostString structs whereas the former is just the Value's string content. We've noticed similar issues in our log aggregator (Seq), but it's fortunately not strict about the data types. Of course it makes them impossible to search, unless you know to search for both Path and Path.Value, which is just messy.

IMO only the textual component should be logged for those properties. Logging the entire struct doesn't make any sense and is just noise. If it made sense then the "begin" logs would've used that instead of the explicit calls to .Value to begin with.

Implying it cannot be fixed because people may be used to it is a bit sad. There shouldn't be a need for new properties either... What's the point of major releases if you can't fix something clearly inconsistent (and arguably incorrect) and cite it as a breaking change?

amcasey commented 1 month ago

IMO only the textual component should be logged for those properties.

Totally agree but, as noted, I think it's too late to change that (more below).

What's the point of major releases if you can't fix something clearly inconsistent (and arguably incorrect)

I think it's worth noting that dotnet doesn't follow semver, so major release aren't actually opportunities to make breaking changes. Some of our thinking on breaking changes is here, but the gist is that the lower in the stack you go, the higher the standards get.

As you can see from our release notes, we make breaking changes nearly every release, so I'm not going to say they're impossible. However, we need a really good reason to cause that kind of customer pain and, sadly, incorrect behavior doesn't always meet that bar. It would need have a very large impact (e.g. a security vulnerability) or affect a large customer segment (possibly indicated by thumbs-up on this issue, twitter threads, SO questions, etc).

pinkfloydx33 commented 1 month ago

In that case I think the status quo is likely better. Adding new properties, while giving a consistent view, would also add bloat to the payload which ultimately needs to be serialized out over the wire.

For us, a solution is possible using Serilog Enrichers by locating the properties by name, detecting if they are object-based and replacing them with scalars. So there are indeed workarounds depending on your stack

amcasey commented 1 month ago

Sorry I don't have a more satisfying answer for you. I'll leave this open so it can collect reactions and we may revisit it in the future.