coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Would there be a way to programmatically get task metrics from the information that Coiled collects? #290

Open rsignell opened 3 months ago

rsignell commented 3 months ago

I often have workflows that are mostly thousands of reads to s3 for chunks of the same size. I can see some of the variability on the Dask dashboard, but was wondering:

Is there might be a way to get the metrics for the tasks programmatically as a dataframe or something so I could look at the distribution, the tails of the distribution, etc. ?

hendrikmakait commented 3 months ago

Hi, @rsignell! I'm curious about the intent behind your request. What problem(s) would you like to solve using task metrics? Since there's a plethora of possible metrics, which ones would you be interested in?

rsignell commented 3 months ago

I would like to look at the variability of the time it takes to retrieve many chunks of identically-sized chunks of data from s3.

hendrikmakait commented 3 months ago

I would like to look at the variability of the time it takes to retrieve many chunks of identically-sized chunks of data from s3.

That would mean you'd be interested in the distribution of task durations for those tasks reading your chunks, or something else? Is there a specific problem that understanding the variability would help you with?

rsignell commented 3 months ago

Yes, I'm trying to figure out how many chunks at a time I should request for each task, and it would be good to know the distribution of s3 access times within the task