golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.28k stars 17.47k forks source link

log/slog: JSONHandler should deduplicate keys #59365

Open veqryn opened 1 year ago

veqryn commented 1 year ago

What version of Go are you using (go version)?

8edcdddb23c6d3f786b465c43b49e8d9a0015082

Does this issue reproduce with the latest release?

yes

What did you do?

    logger := slog.New(slog.NewJSONHandler(os.Stdout))
    logger.LogAttrs(
        context.Background(),
        slog.LevelWarn,
        "fancy message",
        slog.Bool("msg", false),
        slog.String("my attr", "something"),
    )

What did you expect to see?

JSONHandler should be fixed by deduplicating those special keys (time, message, level, and source). I use "source" as a key in a lot of my code right now, so I'd run into this problem.

Either

{
    "time": "2023-04-01T04:27:46.0959443-06:00",
    "level": "WARN",
    "msg": "fancy message",
    "my attr": "something"
}

Or

{
    "time": "2023-04-01T04:27:46.0959443-06:00",
    "level": "WARN",
    "msg": "fancy message",
    "msg.msg": false,
    "my attr": "something"
}

Or

{
    "time": "2023-04-01T04:27:46.0959443-06:00",
    "level": "WARN",
    "msg": "fancy message",
    "fields": {
        "msg": false,
        "my attr": "something"
    }
}

Or

{
    "time": "2023-04-01T04:27:46.0959443-06:00",
    "level": "WARN",
    "msg": "fancy message",
    "msg#01": false,
    "my attr": "something"
}

What did you see instead?

Invalid json. It would break our log aggregation. Default/Builtin handlers should never create invalid output.

{
    "time": "2023-04-01T04:27:46.0959443-06:00",
    "level": "WARN",
    "msg": "fancy message",
    "msg": false,
    "my attr": "something"
}

@jba

EDIT: Hey all, I've created a logging handler middleware that will deduplicate attribute keys, before they go to the json logger. Please check it out: https://github.com/veqryn/slog-dedup

go get github.com/veqryn/slog-dedup

package main

import (
    "log/slog"
    "os"

    slogdedup "github.com/veqryn/slog-dedup"
)

func main() {
    // OverwriteHandler
    overwriter := slogdedup.NewOverwriteHandler(slog.NewJSONHandler(os.Stdout, nil), nil)
    slog.SetDefault(slog.New(overwriter))

    /*
      {
        "time": "2023-10-03T01:30:00Z",
        "level": "INFO",
        "msg": "this is the dedup overwrite handler",
        "duplicated": "two"
      }
    */
    slog.Info("this is the dedup overwrite handler",
        slog.String("duplicated", "zero"),
        slog.String("duplicated", "one"),
        slog.String("duplicated", "two"),
    )

    // IgnoreHandler
    ignorer := slogdedup.NewIgnoreHandler(slog.NewJSONHandler(os.Stdout, nil), nil)
    slog.SetDefault(slog.New(ignorer))

    /*
      {
        "time": "2023-10-03T01:30:00Z",
        "level": "INFO",
        "msg": "this is the dedup ignore handler",
        "duplicated": "zero"
      }
    */
    slog.Info("this is the dedup ignore handler",
        slog.String("duplicated", "zero"),
        slog.String("duplicated", "one"),
        slog.String("duplicated", "two"),
    )

    // IncrementHandler
    incrementer := slogdedup.NewIncrementHandler(slog.NewJSONHandler(os.Stdout, nil), nil)
    slog.SetDefault(slog.New(incrementer))

    /*
      {
        "time": "2023-10-03T01:30:00Z",
        "level": "INFO",
        "msg": "this is the dedup incrementer handler",
        "duplicated": "zero",
        "duplicated#01": "one",
        "duplicated#02": "two"
      }
    */
    slog.Info("this is the dedup incrementer handler",
        slog.String("duplicated", "zero"),
        slog.String("duplicated", "one"),
        slog.String("duplicated", "two"),
    )

    // AppendHandler
    appender := slogdedup.NewAppendHandler(slog.NewJSONHandler(os.Stdout, nil), nil)
    slog.SetDefault(slog.New(appender))

    /*
      {
        "time": "2023-10-03T01:30:00Z",
        "level": "INFO",
        "msg": "this is the dedup appender handler",
        "duplicated": [
          "zero",
          "one",
          "two"
        ]
      }
    */
    slog.Info("this is the dedup appender handler",
        slog.String("duplicated", "zero"),
        slog.String("duplicated", "one"),
        slog.String("duplicated", "two"),
    )
}
jba commented 2 months ago

I haven't looked at the code in detail, but it seems reasonable.

I would test your handler with testing/slogtest.

cupen commented 2 weeks ago

I think that deduplicate is expensive, that's not what a log formater is for. The slog.Handler is an interface, your custom JSONHandler is good enough for it. So let's keep the default one as fast as possible.

madkins23 commented 2 weeks ago

In a previous post on this issue I was pushing for no duplicates in logged JSON. Having spent some quality time with various log/slog handlers and written one myself I now feel like removing duplicates is a fairly costly endeavor and should be optional.

I have compared slog.JSONHandler to snqk/meld and veqryn/slog-dedup (both wrapped around slog.JSONHandler) using my log/slog handler test harness:

Both of them work fine (or as well as the wrapped handlers, anyway). Note the performance cost in both cases.

FYI: @snqk @veqryn

snqk commented 2 weeks ago

https://madkins23.github.io/go-slog/scores/:Dedup/summary.html

Thanks @madkins23

Overall I'd agree that merging/deduplication should not be offered as default. It's also important to note that deduplication ⊂ merge.

However I disagree that merging is that expensive. The benchmarks you posted show the meld logger isn't significantly behind the default logger (~97 vs ~87).

madkins23 commented 2 weeks ago

Expensive is subjective. My goal is to provide data that will help developers choose the appropriate solution for the requirements of a particular project. Your handler is a useful and reasonably efficient option. Thanks for contributing!

madkins23 commented 2 weeks ago

I had to change the URL above to:

due to limitations on file names in sum.golang.org/lookup.

madkins23 commented 1 week ago

@snqk: The "score" numbers on that URL are a somewhat misleading. They are rolled up using an algorithm I pulled out of my posterior. Much useful detail is lost. :disappointed:

I added tables to the score page showing the actual time and memory numbers from the benchmark testing. Check the score page for dedup again and use the Macros, Allocs, and Bytes buttons to the upper right of the score table to see this additional detail.

snqk commented 1 week ago

@madkins23: aye, I'm not sure how the score was calculated but it was easier to reference :smile:. I did look at the rest of the benchmarks though.

While all features come at some cost, your benchmarks helped me discover a couple performance related bugs (thanks!). Not as straightforward to fix, but I believe that's what's causing the bloated numbers for some benchmarks, notably BigGroup.

Going back to the OP though, I don't think this needs to be offered as a feature of JSONHandler. Apart from the obvious performance cost, this problem applies equally to TextHandler as well which isn't considered (as others have previously pointed out).

Outside of convenience, I don't hold a strong opinion on this feature being offered as an independent handler in log/slog.

madkins23 commented 1 week ago

Glad the benchmarks helped, that's what they're for!

I agree that this doesn't need to be pushed into JSONHandler (or TextHandler for that matter). Your wrapper approach seems appropriate for cases where this feature is needed