dask / community

For general discussion and community planning. Discussion issues welcome.
19 stars 3 forks source link

Dask Demo Day 2023-02-16 #304

Closed ncclementi closed 1 year ago

ncclementi commented 1 year ago

When

Thursday, February 16th, at 10am US Central time (meeting invite below and also on the Dask calendar)

Context

I'd like to solicit 5-10 minute demos that show off ongoing or lesser-known work. I hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social. Hopefully, this helps to educate folks on some of the great work people are up to.

If you're interested please respond to this issue with a brief (a couple sentences) description. If you have colleagues who you think should be interested please let them know.

Agenda (Folks that are listed, give a thumbs up to confirm )

Meeting Invite

Topic: Dask Demo Day When: Thursday, February 16th, at 10am US Central

Please download and import the following iCalendar (.ics) files to your calendar system. Monthly: https://us06web.zoom.us/meeting/tZ0uf-qorT4tGtc673iGO0K1lLAN0XrtJ7DV/ics?icsToken=98tyKuGhrTMpGteQtxmERpx5A4qgb_TztiVajbdeyki2Cgd8MiinOs5jHOJHAsz6

Join Zoom Meeting https://us06web.zoom.us/j/89383035703?pwd=WkRJSzNnRTh4T2R1ZjJuVVdJWlMxQT09

ncclementi commented 1 year ago

For the TBD remaining places, or for future dask demo days. If you are interested, comment on this issue with a brief description, and if you are up for the upcoming one on February 16th, or the next one on March 16th!

brian-methodical commented 1 year ago

@ncclementi checking

GenevieveBuckley commented 1 year ago

@ncclementi - Richard Pelgrim has written an excellent blogpost on the last demo day, which might be useful to link when you're advertising: https://blog.dask.org/2022/11/21/november-demo-day

(I saw in the Dask meeting minutes that you'll be organising the next demo day)

hendrikmakait commented 1 year ago

As discussed offline, we should definitely demo P2P, but need to iron out a few kinks first. Let's aim for next month instead.

phobson commented 1 year ago

Looking forward to it @ncclementi

TomNicholas commented 1 year ago

Hi, I would like to give a talk but to be honest we are having a lot of deployment headaches with using dask on the LEAP hub right now, making it harder for me to give a nice demo.

To be clear I'm very pleased with the improvement in performance of the distributed scheduler after the changes we mentioned in the blog post! It's so much better at keeping memory usage down on large task graphs, but only once it actually starts running...

The problems we're having (@jbusecke and I) at the moment are to with actually getting to the point of having a working cluster with many workers. In particular I'm finding:

a) dask_gateway will spin up, but then either never give me any workers or say it has given me workers but they never start doing any computation. I'm not really sure what to say in an issue for these kind of problems, because no error is given. b) I can't use a LocalCluster in a notebook at all due to this issue.

I can manage to run my big calculation using a LocalCluster started from a python script. That actually appears to work, which is amazing in itself, because it means our workload is truly streaming when before the changes to the distributed scheduler it would have completely blown up. I would love to give a demo talking about the details of this.

However (at least on the LEAP hub right now), only being able to run a LocalCluster from a python script rather than a notebook is very limited in scale compared to using distributed (I previously attacked this problem with hundreds of nodes, not just one), will take hours to complete, and would make for a very janky demo :confused:


The code I'm currently using for our calculation is here if you're interested.

mrocklin commented 1 year ago

Hey @TomNicholas

If you're interested, you have an invitation to the dask team at Coiled. If you make a cluster like

import coiled
cluster = coiled.Cluster(account="dask", ...)

it might just solve your problems. This is backed by an AWS account we use for experimentation. Feel free to go wild. If you want to target a specific AWS region you can use backend_options={"region_name": "..."}.

(sorry for the inadvertent advertisement, no worries if you want to pass)

mrocklin commented 1 year ago

Invitation was sent to your columbia.edu address

brian-methodical commented 1 year ago

I would welcome a coiled demo.

I reached out to my team about giving a demo on some Dask use cases in the wild, but they need more notice.

On Sat, Feb 11, 2023 at 12:15 PM Matthew Rocklin @.***> wrote:

Invitation was sent to your columbia.edu address

— Reply to this email directly, view it on GitHub https://github.com/dask/community/issues/304#issuecomment-1426867789, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYCX7GPCANJ6ETDNGLVTGSDWW7XMPANCNFSM6AAAAAAUQUWT2Q . You are receiving this because you were mentioned.Message ID: @.***>

TomNicholas commented 1 year ago

Thanks @mrocklin.

I actually managed to get my whole workload to run completely on just that LocalCluster just now, which I'm very pleased about!

It took about 2 hours with 3 workers - with a distributed cluster with nodes the same size as the LEAP nodes (52GB) I would hopefully be able to do it in just a couple of minutes! At that point processing the (huge) task graph and spinning up the workers would probably take longer than the actual calculation.

We could try it on coiled for the purposes of the demo day, or just to see if it scales, but we really need to get these deployment issues fixed for LEAP anyway.

@ncclementi do you want me to talk about any of this? A live demo wouldn't really make sense for the above reasons, but I am happy to talk about our use case in general.

mrocklin commented 1 year ago

I guess I'm still in advertising mode (sorry!)

rjzamora commented 1 year ago

Since they mentioned an interest offline, I nominate @daxiongshu to present a short demo on accelerated Jaccard similarity using RAPIDS and Dask :)

ncclementi commented 1 year ago

@ncclementi do you want me to talk about any of this? A live demo wouldn't really make sense for the above reasons, but I am happy to talk about our use case in general.

@TomNicholas how about we leave you in the queue for the March session on March 16th? Hopefully, by then you will have the LEAP issues sorted out and we can get the most out of a demo. If you agree, I'll be pinging you next month :)

ncclementi commented 1 year ago

@brian-methodical

I would welcome a coiled demo. Is there anything specific you would would be interested to see?

I reached out to my team about giving a demo on some Dask use cases in the wild, but they need more notice.

Makes sense, can we get some folks for the next month's session on March 16th? That is a bit longer than a month for now : )

ncclementi commented 1 year ago

We exchange a couple of messages via twitter with @mgrover1, He can't make it to this session but will for the next one on March 16th. I am leaving this as a reference for me to ping Max in the issue for the next demo day.

jacobtomlinson commented 1 year ago

I'd love @bstadlbauer to give a demo on the new Dask integration in Flyte. But due to time constraints would need to demo in the first half hour. Would that be ok?

ncclementi commented 1 year ago

I'd love @bstadlbauer to give a demo on the new Dask integration in Flyte. But due to time constraints would need to demo in the first half hour. Would that be ok?

You got this, I'll put him first : )

bstadlbauer commented 1 year ago

Thank you very much @ncclementi!

rjzamora commented 1 year ago

Notebook from configurable-backend demo can be found here: https://gist.github.com/rjzamora/b280e4c8d8b71366925585c307e274e4

ncclementi commented 1 year ago

Links to content from all the talks

jacobtomlinson commented 1 year ago

Did the recording of this get published anywhere? I want to share it with someone.

mrocklin commented 1 year ago

Danny is on PTO this week and will likely handle it when he gets back. Naty is also on PTO starting yesterday I think. This should be handled next week.

On Thu, Feb 23, 2023 at 5:15 AM Jacob Tomlinson @.***> wrote:

Did the recording of this get published anywhere? I want to share it with someone.

— Reply to this email directly, view it on GitHub https://github.com/dask/community/issues/304#issuecomment-1441580298, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTFB3CXPEP342YFDIQLWY5BGPANCNFSM6AAAAAAUQUWT2Q . You are receiving this because you were mentioned.Message ID: @.***>

ncclementi commented 1 year ago

Folks, here is the recording for the most recent Dask Demo Day! https://www.youtube.com/watch?v=R0Hdnhey0pc

cc: @jacobtomlinson

ncclementi commented 1 year ago

Closing this one, since the video is being out for a while. and we have an upcoming session this week.