coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Host Dask Performance Reports #139

Closed mrocklin closed 1 year ago

mrocklin commented 3 years ago

So, this is entirely unrelated to the current deployment product, but maybe easy and good to do anyway.

Currently, a common workflow for Dask users is to record and share a Dask performance_report. Today the workflow is that they call some Dask code within a with block

with performance_report(filename="foo.html"):
    x.compute()

And then they upload that to gist.github.com, and then use a service like raw.githack.com to host that HTML file live. The result is an online document that expert Dask users trade around in Github issues and when dealing with other dask devs to identify and resolve issues. Here is an example. Here is another example of a repository that builds and hosts these every day as part of benchmarking work (scroll down to "Dask Profiles"). This is fine, but somewhat manual, and not very ergonomic. I would love to have a service that managed this for me, and also let me look at previous performance reports. Coiled could do this.

This is totally orthogonal from what we do today, but maybe not a terrible idea given where we want to go with serving long-term telemetry in the future. The use may be something like the following:

with coiled.performance_report() as report:
    x.compute()

report.address
# https://cloud.coiled.io/mrocklin/reports/14

And then I could go and look at that report in the future, and probably go look at /mrocklin/reports to look at a history.

Again, this is a distraction from mainline development, but it's something that, I think, would serve the community fairly well. I really want just any public service to manage this and make it easier, and Coiled happens to be a publicly accessible service.

Drawbacks / future

These files can get large, which could become problematic.

Eventually we'll fix this by migrating to hosting this data in a more structured way (not an HTML file) and not storing every single task, but rather task groups. This migration may cause some folks to become unhappy as we either drop their old data (actually a two month storage period might be a great contract to set with users that allows us to phase this out smoothly).

I'm guessing that this is something like a day of backend development and maybe a couple of days of frontend development. It would be a version-0 of a telemetry product that we could stand up quickly. If we have some free cycles it could be a fun experiment and we might learn something.

mrocklin commented 3 years ago

Maybe this is something that @ian-r-rose might be well adapted to when he returns from PTO, all the way from usage to backend to frontend.

necaris commented 3 years ago

This could be fun and easy -- but I'd want us to be clear about what we're offering to anyone. How much of an expectation of privacy would folks have around their performance reports? If they're sharing around Gists, presumably not much?

necaris commented 3 years ago

Also wondering if this is something @meimeisuns or @ndanielsen could use as a learning experience?

mrocklin commented 3 years ago

I think that we would treat these with the same privacy model as software environments. Hopefully by copying something else it keeps things simple.

necaris commented 3 years ago

@ndanielsen would love your thoughts on this

jrbourbeau commented 3 years ago

FWIW this is something I would use often, so +1 from me : )

ndanielsen commented 3 years ago

@necaris in the next day or so - I'll fiddle with this and share thoughts as I haven't yet used this feature in dask

jrbourbeau commented 3 years ago

@ndanielsen If you're interested I'd be happy to hop on a call and go through how performance reports work in Dask

ian-r-rose commented 3 years ago

I'm also happy to engage in this

ndanielsen commented 3 years ago

I'll follow up a little later today to learn a little more about how we can hook into the dask via coiled client, etc. That's my main point of information that I'm lacking. After that, it'll be off to the races

ndanielsen commented 3 years ago

xref: https://github.com/dask/distributed/issues/4756

ndanielsen commented 3 years ago

For tracking, here's the upstream PR to distributed for this:

https://github.com/dask/distributed/pull/4777

shughes-uk commented 1 year ago

Feature implement