The benchmarking data currently includes times for the phases within each step. (In Batch terms, these are the Task runnables.)
But we also want the total, end-to-end time: from when the user submits the job, to when it completes.
The gather-benchmark script doesn't have access to job completion; indeed the job isn't done yet (this script being a part of the job itself).
We need to add the start time to the preprocessing script, as well as the end time to the postprocessing script. This accounts for end-to-end of runtime but still doesn't include the upfront machine allocation + container fetch.
We also need to fetch the batch state changes (queued → scheduled → running) to get that data.
This may be complicated when there are multiple tasks running within a job (1 per input file).
The benchmarking data currently includes times for the phases within each step. (In Batch terms, these are the Task runnables.)
But we also want the total, end-to-end time: from when the user submits the job, to when it completes.
The gather-benchmark script doesn't have access to job completion; indeed the job isn't done yet (this script being a part of the job itself).
We need to add the start time to the preprocessing script, as well as the end time to the postprocessing script. This accounts for end-to-end of runtime but still doesn't include the upfront machine allocation + container fetch.
We also need to fetch the batch state changes (queued → scheduled → running) to get that data.
This may be complicated when there are multiple tasks running within a job (1 per input file).