canonical / pebble

Pebble is a lightweight Linux service manager with layered configuration and an HTTP API.
https://canonical-pebble.readthedocs-hosted.com/
GNU General Public License v3.0
149 stars 54 forks source link

Strip timestamp from service logs to avoid double-timestamp issue #91

Closed benhoyt closed 2 years ago

benhoyt commented 2 years ago

Currently when a service outputs timestamps itself we get double timestamps in the log, because Pebble also adds a timestamp-and-service-name prefix to every log line. We want to trim/strip the service's timestamp to avoid the double-up.

Some notes per discussion with Harry and Gustavo:

rwcarlsen commented 2 years ago

I haven't been able dig up any good examples of smartly handling parse-prefix style time parsing; all libs on the nets seem to operate on an exact/complete string. The naive thing to do is to try parsing out a timestamp out of consecutively larger prefixes - starting at some minimum length until you get a successful parse. And buffering the the entire bytestring passed to a write call if it doesn't (yet) have an entire timestamp. But this would certainly be a bit slower.

The stdlib's time.Parse is pretty good at catching "extra text" errors. If we want to avoid the brute-force checking of all prefix lengths against a parse function, we could only trim times with an explicitly user-specified format string in combination with either:

Thoughts?

benhoyt commented 2 years ago

Hmm, yeah, it'd be nice there was a time.ParsePrefix, or if time.Parse returned a more structured ExtraText error.

I definitely don't like the idea of parsing consecutively larger prefixes in a loop -- seems like a very inefficient operation to do on each log line. I'm already a bit concerned about efficiency here because this will happen on every log line.

However, after looking at the time.Parse code, I don't hate the idea of matching on "extra text" as much as I thought I might. For one thing, there's already a custom time.ParseError type with a ValueElem field that stores the (unquoted) extra text. We'd still have to detect it, though, so the code for that would look something like so:

t, err := time.Parse(layout, line)
if e, ok := err.(*time.ParseError); ok && strings.HasPrefix(e.Message, ": extra text: ") {
    prefix := line[:len(line)-len(e.ValueElem)]
    t, _ = time.Parse(layout, prefix) // parsing just the prefix should succeed
    err = nil
}

The Go authors are very unlikely to change the "extra text" message, so this doesn't seem horrible and is maybe a pragmatic approach. We'd want to good tests that'd break if Go 1.25 or whatever did change it. That said, it also doesn't seem horrible to copy a version of time.Parse into our tree and modify it.

There's another alternative: have a list of regexen, and try to match the prefix using a regex first, and only call time.Parse if it matches. However, if we supported custom time formats, we'd need something that converted a Go time layout into a regex -- that's probably not too hard, but may be more work than it's worth. (With the default "auto-recognize timestamp" we could hard code the regexes.)

Overall, I'd probably lean towards the "extra text" hack to start with, and we can easily switch to a copied version of the time.Parse code later, or if there's not appetite for that approach.

benhoyt commented 2 years ago

I've also asked about this on golang-nuts -- we'll see if there's any response.

rwcarlsen commented 2 years ago

I agree with shelving a pre-parse regex step for now. And I'd also be inclined to prefer the error wrangling over vendoring time.Parse into the codebase - and only switch to the heavier approach if/when we run into problems.

rwcarlsen commented 2 years ago

Another option: we could just have the user give us a regex for each service that we use to discard the matched portion of the log line. This has the benefit that we can discard more than just timestamps - e.g. maybe there is some sort of redundant MYSQL: prefix content in the log line as well that we could skip over since pebble also puts a service name in its log lines.

benhoyt commented 2 years ago

Another option: we could just have the user give us a regex for each service that we use to discard the matched portion of the log line. This has the benefit that we can discard more than just timestamps - e.g. maybe there is some sort of redundant MYSQL: prefix content in the log line as well that we could skip over since pebble also puts a service name in its log lines.

Interesting ... yeah, that's a good idea. Explicit and simple to implement. Go's regexp package is relatively slow (guaranteed linear time, but slowish compared to other libraries) but should be good enough for this use case. I think we've previously discussed that if someone is pumping massive volumes of logs through this something's probably not configured right anyway.

Yeah, I like this approach.

benhoyt commented 2 years ago

Notes from our meeting just now:

rwcarlsen commented 2 years ago

Here is the latest discussion from mattermost:

rwcarlsen Here is my k8s sidecar charm log/timestamp collection to date:

benhoyt [3:22 PM]

@rwcarlsen That's very interesting data -- I wouldn't have thought! So I see it's mostly structured logging, either in key=value format or JSON (in the case of Mongo). I'm not sure what the 18:S prefix on the redis logs refers to, but unless that's part of the timestamp that makes it only 6 out of 10 that don't have the time as a prefix.

I'm beginning to think time-trim isn't a good idea. I mean, even if we get more clever with " and = delimiters and can find the timestamp, do we then strip it out? That would leave the resulting log in a weird state, like (for the first example):

level=info ts= caller=table_manager.go:169 msg="uploading tables"

rwcarlsen 10:34 AM Yeah - the redis one is actually some sort of [PID]:[role] apparently - so not part of the timestamp :confused:

I agree that stripping the timestamp out of a structured log format seems wrong. Who would have thought that "unstructured" log formats were old-fashioned? It certainly doesn't feel like this can be the big kind of win we originally hoped for. I'm certainly okay shelving it (forever?).

benhoyt 2:52 PM @rwcarlsen Yeah, doesn't seem like we're winning given the above.

benhoyt commented 2 years ago

Given the research @rwcarlsen has done on this, and the discussion above, closing this for now. Seems like we're unlikely to solve this by just "stripping timestamps". But we can always re-open or open a new issue if we think of a better way.