Open rsignell opened 3 months ago
Hi, @rsignell! I'm curious about the intent behind your request. What problem(s) would you like to solve using task metrics? Since there's a plethora of possible metrics, which ones would you be interested in?
I would like to look at the variability of the time it takes to retrieve many chunks of identically-sized chunks of data from s3.
I would like to look at the variability of the time it takes to retrieve many chunks of identically-sized chunks of data from s3.
That would mean you'd be interested in the distribution of task durations for those tasks reading your chunks, or something else? Is there a specific problem that understanding the variability would help you with?
Yes, I'm trying to figure out how many chunks at a time I should request for each task, and it would be good to know the distribution of s3 access times within the task
I often have workflows that are mostly thousands of reads to s3 for chunks of the same size. I can see some of the variability on the Dask dashboard, but was wondering:
Is there might be a way to get the metrics for the tasks programmatically as a dataframe or something so I could look at the distribution, the tails of the distribution, etc. ?