Add start/end time and duration information to BEP messages

tbaing commented 1 year ago

Description of the feature request:

BEP messages don't currently include timestamps on per-action start/completed events, nor the duration of the action in the completion event. This information is not currently in the proto, and would be important for us to use BEP events to analyze the behavior of our builds so we can summarize the results to the scripts that invoke Bazel.

Which category does this issue belong to?

No response

What underlying problem are you trying to solve with this feature?

Allow on-host analysis of our builds as they execute, so we can summarize those results to the scripts that invoke Bazel and provide feedback about the state of the build as it executes, without requiring us to process files (e.g. command.profile.gz) that are only created when the build finishes.

Which operating system are you running Bazel on?

No response

What is the output of `bazel info release`?

No response

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

No response

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

michaeledgar commented 1 year ago

This is a perfectly sensible feature request. In particular it would benefit users who use --build_event_publish_all_actions.

The devil is always in the details - in practice, there are many start/stop time periods that might be of interest, depending on the Action's execution strategy. The most sensible timestamps to include for this feature would be:

The logically earliest "start" timestamp before the execution strategy participates in the action at all;
The logically latest "stop" timestamp after the execution strategy is absolutely done participating in the action.

For example, if executing locally, Bazel will "start" executing the Action, then acquire [--local_cpu_resources](https://bazel.build/reference/command-line-reference#flag--local_cpu_resources), --local_ram_resources, and so on. After acquiring those resources, which may take an arbitrarily long period, action execution proceeds. Users are interested in seeing this fully broken down (see issue #12725), but the resource-acquisition timing is a detail specific to the execution with the "local" strategy.

If an action uses workers, there are of course timings related to the worker protocol that could be considered, but these fields would contain timings that occur before/after the worker protocol.

Remote execution, there are infrastructure timings, retries, and so on that would need to be provided separately.

I believe that in addition to these requested fields, we should include an Any protobuf that would eventually contain these strategy-specific details (perhaps only if requested by opt-in flag?).

michaeledgar commented 1 year ago

I have a working prototype of this feature. Here's a JSON BEP example I extracted from a new test case in build_event_stream_test.sh.

{
    "id": {
        "actionCompleted": {
            "primaryOutput": "bazel-out/config/genfiles/demo/pass.out",
            "label": "//demo:passrule",
            "configuration": {
                "id": "770c5560358242677c84a9ab02af8c55f4b7f9c9210b360a42278e6e942e28d5"
            }
        }
    },
    "action": {
        "success": true,
        "label": "//demo:passrule",
        "primaryOutput": {
            "uri": "bazel-out/config/genfiles/demo/pass.out"
        },
        "configuration": {
            "id": "770c5560358242677c84a9ab02af8c55f4b7f9c9210b360a42278e6e942e28d5"
        },
        "type": "Genrule",
        "commandLine": [
            "bin/bash",
            "-c",
            "echo passrule > bazel-out/config/genfiles/demo/pass.out; true"
        ],
        "startTime": "2023-10-19T16:58:01.860708Z",
        "endTime": "2023-10-19T16:58:01.905708Z",
        "strategyDetails": [
            {
                "@type": "type.googleapis.com/tools.protos.SpawnExec",
                "remotable": true,
                "cacheable": true,
                "progressMessage": "Executing genrule //demo:passrule",
                "mnemonic": "Genrule",
                "runner": "local",
                "remoteCacheable": true,
                "walltime": "0.045s",
                "targetLabel": "//demo:passrule",
                "metrics": {
                    "totalTime": "0.158s",
                    "queueTime": "0.015s",
                    "setupTime": "0.052s",
                    "executionWallTime": "0.045s",
                    "inputFiles": "1"
                }
            }
        ]
    }
}

One hassle is that at the time we post the ActionExecutedEvent, for failed actions, we have discarded any SpawnResult objects available to populate start_time and end_time. So my prototype so far only works for successful actions, which is the least-interesting case for this feature.

imarban commented 10 months ago

@michaeledgar Any way I can take a peek at your prototype? I'd like to start digging around and see how we can get SpawnResults available when posting the BEP event.

We have code that processes BEP events and traces from the profile file to extract most of what's available for actions. Would love to consolidate the logic to consume from one place only.

bazelbuild / bazel