dchaley / deepcell-imaging

Tools & guidance to scale DeepCell imaging on Google Cloud Batch
8 stars 2 forks source link

Update benchmarking to include total time #345

Open dchaley opened 2 months ago

dchaley commented 2 months ago

The benchmarking data currently includes times for the phases within each step. (In Batch terms, these are the Task runnables.)

But we also want the total, end-to-end time: from when the user submits the job, to when it completes.

The gather-benchmark script doesn't have access to job completion; indeed the job isn't done yet (this script being a part of the job itself).

We need to add the start time to the preprocessing script, as well as the end time to the postprocessing script. This accounts for end-to-end of runtime but still doesn't include the upfront machine allocation + container fetch.

We also need to fetch the batch state changes (queued → scheduled → running) to get that data.

This may be complicated when there are multiple tasks running within a job (1 per input file).