coiled / examples

Examples using Dask and Coiled
14 stars 3 forks source link

Use Coiled Functions in arxiv matplotlib example #34

Closed jrbourbeau closed 9 months ago

jrbourbeau commented 9 months ago

Closes https://github.com/coiled/examples/issues/33

review-notebook-app[bot] commented 9 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

jrbourbeau commented 9 months ago

Note the major change here is I'm currently only processing the first 1000 directories at the moment. This takes ~3.5 minutes with Coiled Functions scaling up to 100 workers (the default adaptive limit). Doing the full 6000+ directories is probably too slow for this example. Some options would be to allow ourselves to adaptive scale to more VMs or to use the new threads_per_worker kwarg

dchudz commented 9 months ago

Some options would be to allow ourselves to adaptive scale to more VMs or to use the new threads_per_worker kwarg

I'd go with one of those options. IMO example becomes a lot less compelling if we restrict its size to what we could easily handle. "Sorry too big" isn't really the look I want for Coiled.

mrocklin commented 9 months ago

Let's change the default adaptive maximum for functions.

dchudz commented 9 months ago

sounds good

ntabris commented 9 months ago

What do people think about also changing the example so that workload runs on single-core ARM workers? That's the best way I've found to run this.

(If we do this, I propose we also tweak something so that scheduler is not single-core machine in this case.)

jrbourbeau commented 9 months ago

Sounds good -- I've found

arm=True
cpu=1
spot_policy="spot_with_fallback"

to be good setting for churning through lots of small files

jrbourbeau commented 9 months ago

I'm going to merge this in -- happy to follow-up if folks have additional comments though