Closed mrocklin closed 1 year ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Just adding evidence of "we're not really using the GPU here"
Also pinging @mccarty in case he knows of nicer examples than this or of anyone who might be able to answer that question well. It would be really nice to have something that really shows off the cost benefits of GPUs here.
Addressing the "I only got 6 of 10 workers" issue: I've found that GPU availability is better in us-west-2
. Just tried this and got all the requested workers...
coiled.Cluster(
worker_vm_types="g4dn.xlarge",
n_workers=10,
account="dask-engineering",
backend_options={"region_name":"us-west-2"}
)
(I tried to get 30 g4dn.xlarge
in us-west-2
and got 16, so still pretty constrained, but I think less so than us-east regions)
Sounds good. Thanks Nat.
On Mon, Apr 10, 2023 at 3:28 PM Nat Tabris @.***> wrote:
Addressing the "I only got 6 of 10 workers" issue: I've found that GPU availability is better in us-west-2. Just tried this and got all the requested workers...
coiled.Cluster( worker_vm_types="g4dn.xlarge", n_workers=10, account="dask-engineering", backend_options={"region_name":"us-west-2"} )
(I tried to get 30 g4dn.xlarge in us-west-2 and got 16, so still pretty constrained, but I think less so than us-east regions)
— Reply to this email directly, view it on GitHub https://github.com/coiled/examples/pull/5#issuecomment-1502280385, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTCM7HLU5JCWJTD4RADXARUPVANCNFSM6AAAAAAWZJLYCM . You are receiving this because you authored the thread.Message ID: @.***>
--
Matthew Rocklin CEO
dask-cuda doesn't work with dask 2023.03 or pandas 2. Ran into other issues. I eventually abandoned this and just specified nthreads=1
Interested in if things broke here at the environment solve level or later on? I was able to get the RAPIDS environment you showed in the loom solved locally (I switched over to using 23.04 nightlies so we could use unpinned Dask), though I imagine if we needed to pull in cuDF that would introduce a pandas<2
constraint.
Also pinging @mccarty in case he knows of nicer examples than this
cc @mmccarty
Interested in if things broke here at the environment solve level or later on?
Later on. For example it would bring in versions of libraries, like pandas 2.0, that didn't work when I went to run things. This happened a few times and so I eventually just moved on.
It makes me sad that our product didn't make it easier to know that AWS availability was the issue. But I'm guessing that a user who cared how many instances they got might have gotten further.
Matt, this is an example where a user might care about that infrastructure details page that you always say only platform engineers care about. I think the info isn't really exposed elsewhere, though maybe it should be.
On Mon, Apr 10, 2023, 4:43 PM Matthew Rocklin @.***> wrote:
Sounds good. Thanks Nat.
On Mon, Apr 10, 2023 at 3:28 PM Nat Tabris @.***> wrote:
Addressing the "I only got 6 of 10 workers" issue: I've found that GPU availability is better in us-west-2. Just tried this and got all the requested workers...
coiled.Cluster( worker_vm_types="g4dn.xlarge", n_workers=10, account="dask-engineering", backend_options={"region_name":"us-west-2"} )
(I tried to get 30 g4dn.xlarge in us-west-2 and got 16, so still pretty constrained, but I think less so than us-east regions)
— Reply to this email directly, view it on GitHub https://github.com/coiled/examples/pull/5#issuecomment-1502280385, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AACKZTCM7HLU5JCWJTD4RADXARUPVANCNFSM6AAAAAAWZJLYCM
. You are receiving this because you authored the thread.Message ID: @.***>
--
Matthew Rocklin CEO
— Reply to this email directly, view it on GitHub https://github.com/coiled/examples/pull/5#issuecomment-1502295647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJKQRQQWK3H5CXWNTYCAZDXARWHRANCNFSM6AAAAAAWZJLYCM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hey @mrocklin Sorry I'm late to the party here. I'll play around with this example and see what can be done to show off GPUs more effectively. Also, note that Pandas 2 support is in the works.
Thanks Mike!
On Wed, Apr 12, 2023 at 8:42 AM Mike McCarty @.***> wrote:
Hey @mrocklin https://github.com/mrocklin Sorry I'm late to the party here. I'll play around with this example and see what can be done to show off GPUs more effectively. Also, note that Pandas 2 support is in the works.
— Reply to this email directly, view it on GitHub https://github.com/coiled/examples/pull/5#issuecomment-1505302470, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTGWXVSH7JMCT2WVMQDXA2WMVANCNFSM6AAAAAAWZJLYCM . You are receiving this because you were mentioned.Message ID: @.***>
--
Matthew Rocklin CEO
I ran into a bunch of issues with environments and config. What's here works though. It's not faster than CPUs though, mostly because data loading is more expensive in this example than training.
Loom video with some thoughts: https://www.loom.com/share/0c38fdb3bd334756b49df80d301102ea
Some issues:
nthreads=1
cc @jrbourbeau @ntabris @jacobtomlinson