dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 718 forks source link

Adding custom plots to Dask Dashboard #3503

Open mronda opened 4 years ago

mronda commented 4 years ago

Hi, I am running multiple tasks that takes a while to return, so I want to be able to graph some custom data in between to monitor their progress. Can you guys point me to the documentation/code so I can see where to make changes to the Bokeh Dashboard? Thanks in advance! -Max

quasiben commented 4 years ago

@mronda I don't think we have docs on customizing the dashboard (though that would be welcome). Instead, I would recommend reading through 2 recent pull requests:

Can I ask what additional data you were hoping visualize ?

mronda commented 4 years ago

Hi @quasiben , thanks for the response. I checked the first pr but still not getting how I should connect to that server and pass, from my script, the data to be received by Bokeh server. What handler would I need in my script to make that communication? Not too familiar with Bokeh, any help would be a huge time saver right now. Oh and I am trying to plot the moving average of a model for monitoring purposes. Thanks in advance ! -Max

jacobtomlinson commented 4 years ago

I wonder if a section of the dashboard showing published datasets would be useful? If a dataset is already persisted and has a type of int, float or str it could previewed in the dashboard too.

That way you could continuously persist the moving average and view it in the dashboard?

Perhaps this is abusing the published datasets feature a little too much though...

mronda commented 4 years ago

Hi @jacobtomlinson , So I've tried using client.publish_datasets to do some visualization of my data. The problem I encountered, and correct me if I am wrong, is that I needed to constantly unpublish and publish the data as I could not persist it to the client. I've also tried using a dask.distributed Queue, which worked fine but I am looking into a more pipeline approach where I can use the dashboard provided to monitor the running average. Any recommendations? Maybe a better way to use those collections? Thanks! -Max

mrocklin commented 4 years ago

@lesteve had something for this. I think he submitted a pull request in the last year or so. I can't find it in the documentation right now, but you might search through closed github pull requests for things authored by him and my guess is that you'll come across it.

On Thu, Feb 20, 2020 at 7:45 AM mronda notifications@github.com wrote:

Hi @jacobtomlinson https://github.com/jacobtomlinson , So I've tried using client.publish_datasets to do some visualization of my data. The problem I encountered, and correct me if I am wrong, is that I needed to constantly unpublish and publish the data as I could not persist it to the client. I've also tried using a dask.distributed Queue, which worked fine but I am looking into a more pipeline approach where I can use the dashboard provided to monitor the running average. Any recommendations? Maybe a better way to use those collections? Thanks! -Max

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/3503?email_source=notifications&email_token=AACKZTEQJTWVD3Z2ZQ36QULRD2QTHA5CNFSM4KYEYY22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMO2SUA#issuecomment-589146448, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTCCZZHEYLELM27CRU3RD2QTHANCNFSM4KYEYY2Q .

quasiben commented 4 years ago

I think this is the PR, https://github.com/dask/distributed/pull/2169, @mrocklin is referring to.