coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Workers get more CPU cores than expected #59

Closed mrocklin closed 4 years ago

mrocklin commented 4 years ago

Interestingly, Dask workers can get more CPU cores than expected. Here is an example.

import coiled
coiled.create_cluster_configuration(name="test-cpu-count", worker_cpu=2, worker_memory="16 GiB", software="coiled/default")
cluster = coiled.Cluster(configuration="test-cpu-count")
cluster
coiled.Cluster('tls://ec2-18-222-107-204.us-east-2.compute.amazonaws.com:8786', workers=4, threads=16, memory=68.72 GB)

So even though we ask for two CPUs, we appear to get four. What gives?

When I look internally at the AWS ECS task definition I see that we're registering the CPU count correctly

image

My guess here is that AWS just gives us a VM with four cores anyway, based on how much memory we have requested, and that the dask-worker command sees those four cores and uses them. It's not clear to me what will happen if we start to use lots of computation on those four cores. I wouldn't be surprised if everything just works and we get more than we paid for, but I also wouldn't be surprised if we get throttled.

Probably the polite thing for us to do here is to specify the number of threads explicitly in the Dask worker command. cc @jrbourbeau ?

We're ignoring worker kwargs?

Also, interstingly, we seem to be ignoring the nthreads command if provided explicitly.

coiled.create_cluster_configuration(name="test-cpu-count", worker_cpu=2, worker_memory="16 GiB", software="coiled/default", worker_options={"nthreads": 1})
cluster = coiled.Cluster(configuration="test-cpu-count")
cluster
coiled.Cluster('tls://ec2-3-12-197-11.us-east-2.compute.amazonaws.com:8786', workers=4, threads=16, memory=68.72 GB)
jrbourbeau commented 4 years ago

Probably the polite thing for us to do here is to specify the number of threads explicitly in the Dask worker command

Yeah, that seems like a reasonable thing to do. I'll make an update accordingly 👍

Also, interstingly, we seem to be ignoring the nthreads command if provided explicitly.

Hrm, I'm not able to reproduce:

In [3]: coiled.create_cluster_configuration(name="test-cpu-count", worker_cpu=2, worker_memory="16 GiB", software="coiled/default", worker_opti
   ...: ons={"nthreads": 1})

In [4]: cluster = coiled.Cluster(configuration="test-cpu-count")
Creating Cluster. This takes about a minute .../Checking environment images
Valid environment image found

In [5]: cluster
Out[5]: coiled.Cluster('tls://ec2-18-221-90-197.us-east-2.compute.amazonaws.com:8786', workers=4, threads=4, memory=17.18 GB)

I wonder what's different between our setups? I've tried with coiled==0.0.22 and the current dev version of coiled with beta.coiled.io and gotten the same result as above.

necaris commented 4 years ago

My guess here is that AWS just gives us a VM with four cores anyway, based on how much memory we have requested, and that the dask-worker command sees those four cores and uses them.

It's also possible that when we ask for more cores we're getting a more powerful VM with more hyperthreads? Either way, explicitly asking for a number of threads makes sense :smile:

dantheman39 commented 4 years ago

Fixed by @jrbourbeau in the above PR.