golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.67k stars 17.62k forks source link

cmd/trace: add absolute (system) timestamp #69869

Open aktau opened 1 week ago

aktau commented 1 week ago

Proposal Details

Runtime traces don't contain (or display) an absolute timestamp for events. This would be useful for comparing traces with other (tracing) tools, logs et cetera.

An example would be combining a Go runtime trace with perf sched events. This would allow better investigation of kernel scheduling events and how they impact the latency of goroutines.

IIRC, runtime traces currently use a relative (monotonic) timestamps for performance reasons. It's fine to keep this as-is. But it would be helpful if (e.g.) the start of a trace batch could contain an absolute system time (perhaps something lie nanoseconds since the UNIX epoch) together with the relative timestamp it was derived from. Tools like go tool trace could then adjust all the relative timestamps post-hoc, or at least display the trace start and end time.

cc @mknyszek

aktau commented 1 week ago

(I don't know if this needs to be a proposal. It's not really a noteworthy feature, as it's an optional addition and only concerns very few users of a sub-tool, please adjust accordingly.)

prattmic commented 1 week ago

In the new trace format, we do usually store absolute (monotonic) timestamps. The start of each batch has the full absolute timestamp. Each event within the batch contains the time delta from the previous event.

The exception is Windows, where time has low resolution, so in that case we use CPU ticks (RDTSC).

cc @felixge

aktau commented 6 days ago

That's nice. I don't know much about catapult, but could it be made to display things in absolute time?

prattmic commented 6 days ago

In theory, I believe this field just needs to be in absolute time: https://cs.opensource.google/go/go/+/master:src/internal/trace/traceviewer/format/format.go;l=25;drc=70f4717e96cf9d8ca8a5f439d7d6487ec822ce49;bpv=1;bpt=1

The current values tend to just come from computations like https://cs.opensource.google/go/go/+/master:src/cmd/trace/goroutinegen.go;l=137;drc=70f4717e96cf9d8ca8a5f439d7d6487ec822ce49;bpv=1;bpt=1, which currently are event.Time - startTime.

The format says the time should be in microseconds. I don't know how nicely Catapult will display the values, given they will probably be huge.

aktau commented 3 days ago

That's just reading AFAICT. I spent some time looking for writing and found traceClockNow: https://github.com/golang/go/blob/f2d118fd5f7e872804a5825ce29797f81a28b0fa/src/runtime/tracetime.go#L57-L62

nanotime is (on Linux) clock_gettime(CLOCK_MONOTONIC). Which is (quoting man7) time since boot:

CLOCK_MONOTONIC
              A nonsettable system-wide clock that represents monotonic time
              since—as described by POSIX—"some unspecified point in the
              past".  On Linux, that point corresponds to the number of sec‐
              onds that the system has been running since it was booted.

              The CLOCK_MONOTONIC clock is not affected by discontinuous
              jumps in the system time (e.g., if the system administrator
              manually changes the clock), but is affected by the incremen‐
              tal adjustments performed by [adjtime(3)](https://man7.org/linux/man-pages/man3/adjtime.3.html) and NTP.  This clock
              does not count time that the system is suspended.  All
              CLOCK_MONOTONIC variants guarantee that the time returned by
              consecutive calls will not go backwards, but successive calls
              may—depending on the architecture—return identical (not-
              increased) time values.

The absolute value can thus be calculated if we knew the time since boot. But I haven't been able to find that anywhere in a trace. Did I miss it?

prattmic commented 2 days ago

The absolute value can thus be calculated if we knew the time since boot. But I haven't been able to find that anywhere in a trace. Did I miss it?

I think I am confused by what you mean by "absolute time". I thought that meant you wanted the CLOCK_MONOTONIC value, which we have. Do you actually mean the CLOCK_REALTIME value (wall time)? We don't record that. I don't think that would be as useful for correlation with other trace events on the machine, since the real time clock can change arbitrarily.

aktau commented 2 days ago

Do you actually mean the CLOCK_REALTIME value (wall time)? We don't record that. I don't think that would be as useful for correlation with other trace events on the machine, since the real time clock can change arbitrarily.

For correlation with other trace events on the machine, CLOCK_MONOTONIC is fine (ideal even), if one has these events available from the same epoch (boot time in Linux's case).

In my case, I don't always:

Specifically, I get a crash which has exported a trace around crash time using the flight recorder. The monitoring service tells me the crash timestamp is 01:03:53 21-10-2024 UTC. I'd like to know when the trace started, and until what time before the crash it actually ran. I'd like the "when" to be absolute here, not some machine-relative timestamp.

It seems to me like this could be solved by (also) adding a timestamp relative to the UNIX epoch in every generation event. Then, tooling (go tool trace or something custom) could render all events in either a relative timeline (current default) or an absolute timeline.

That, or I'm missing something that currently exists, or a different way to correlate the data I want. Any ideas?

prattmic commented 2 days ago

That makes sense. There is a caveat that it will probably be difficult to precisely line up events using wall time, but I agree it would be useful even just for getting a vague sense of where within logs, etc the trace occurs (within a few hundred ms).

I think including a real time timestamp in each generation would be reasonable. I think that would be sufficient. If the real time clock changes during tracing then computed timestamps may be wrong, but that should be rare.

prattmic commented 2 days ago

(cc @mknyszek for thoughts on including a real time timestamp in each generation)

aktau commented 1 day ago

There is a caveat that it will probably be difficult to precisely line up events using wall time, but I agree it would be useful even just for getting a vague sense of where within logs, etc the trace occurs (within a few hundred ms).

Up to a few hundred ms is much better than what's currently available. It would already help me. I wonder how easy it is to get the equivalent of time.Now().UnixNano() in the runtime...

prattmic commented 1 day ago

runtime.walltime

mknyszek commented 1 day ago

We could just add this as a new event type that always lives in the same batch as EvFrequency, which currently always lives in its own batch. Old toolchains won't understand it and fail, but new toolchains will still be able to understand old traces without having to interpret anything differently.

With respect to exposing this in the trace API, the easiest thing to do would be to make it a property of the Sync event type; we could then also emit a Sync event at the start of the trace so we have a place to put the information for the first generation. Sync events always precede a generation, so you can always count on having the value you need to interpret all timestamps in a given generation.

mknyszek commented 1 day ago

I don't think this needs to be a proposal. The trace parsing API is still experimental.

aktau commented 1 day ago

We could just add this as a new event type that always lives in the same batch as EvFrequency... With respect to exposing this in the trace API, the easiest thing to do would be to make it a property of the Sync event type.

I don't know much about this, but this sounds contradictory, unless internal trace events and trace API events are fully decoupled. In which case, SGTM.

mknyszek commented 1 day ago

We could just add this as a new event type that always lives in the same batch as EvFrequency... With respect to exposing this in the trace API, the easiest thing to do would be to make it a property of the Sync event type.

I don't know much about this, but this sounds contradictory, unless internal trace events and trace API events are fully decoupled. In which case, SGTM.

Sorry, yes. They are completely decoupled. The Sync event type is an event type in the parsing API, the new event type would be part of the wire format.