coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

ClusterCreationError: Only 0 workers ready (was waiting for at least 1). (cluster_id: 31165) #158

Closed SultanOrazbayev closed 2 years ago

SultanOrazbayev commented 2 years ago

To execute a workload requiring large memory, I am attempting to launch a specific worker instance:

from coiled import Cluster
from distributed import Client

cluster = Cluster(
    software="taxi-xgboost", n_workers=2, worker_class="m6i.4xlarge",
)
client = Client(cluster)

After seemingly creating workers and launching the software, the code fails with this message:

---------------------------------------------------------------------------
ClusterCreationError                      Traceback (most recent call last)
Input In [1], in <cell line: 4>()
      1 from coiled import Cluster
      2 from distributed import Client
----> 4 cluster = Cluster(
      5     software="taxi-xgboost", n_workers=2, worker_class="m6i.4xlarge",
      6 )
      7 client = Client(cluster)

File ~/miniconda3/envs/coiled_taxi/lib/python3.9/site-packages/coiled/_beta/cluster.py:369, in ClusterBeta.__init__(self, name, software, n_workers, worker_class, worker_options, worker_vm_types, worker_cpu, worker_memory, worker_gpu, worker_gpu_type, scheduler_class, scheduler_options, scheduler_vm_types, scheduler_cpu, scheduler_memory, asynchronous, cloud, account, shutdown_on_close, use_scheduler_public_ip, credentials, timeout, environ, backend_options, show_widget, configure_logging, wait_for_workers)
    367     if self.cluster_id:
    368         log_cluster_debug_info(self.cluster_id, self.account)
--> 369     raise e.with_traceback(None)
    370 except KeyboardInterrupt:
    371     if self.cluster_id is not None:

ClusterCreationError: Only 0 workers ready (was waiting for at least 1).  (cluster_id: 31165)
SultanOrazbayev commented 2 years ago

Tried with a different worker type, but see a similar behavour:

from coiled import Cluster
from distributed import Client

cluster = Cluster(
    software="taxi-xgboost", n_workers=2, worker_class="m6id.2xlarge",
)
client = Client(cluster)
SultanOrazbayev commented 2 years ago

To elaborate on why I'm requesting specific worker class: initially I tried just bumping up the worker_memory kwarg:

from coiled import Cluster
from distributed import Client

cluster = Cluster(
    software="taxi-xgboost", n_workers=5, worker_cpu=[2, 8], worker_memory="64GB"
)
client = Client(cluster)

But it would err with message:

InstanceTypeError: Unable to find instance types that match the specification: 
     Cores: [2, 8]  Memory: 128GB 
You can try selecting a range for the cpu or memory, for example: `cpu=[2, 8]` 
You might want to pick these instances that match your memory requirements, but have different core count. 
['x2iedn.xlarge', 'x1e.xlarge', 'r5a.4xlarge', 'r6i.4xlarge', 'r5.4xlarge', 'r5ad.4xlarge', 'r4.4xlarge', 'g3.4xlarge', 'r5d.4xlarge', 'r5n.4xlarge', 'r5b.4xlarge', 'i3.4xlarge', 'r3.4xlarge', 'r5dn.4xlarge', 'i4i.4xlarge', 'm5a.8xlarge', 'm6a.8xlarge', 'm5.8xlarge', 'm6i.8xlarge', 'm5ad.8xlarge', 'f1.2xlarge', 'm5d.8xlarge', 'h1.8xlarge', 'm6id.8xlarge', 'm5n.8xlarge', 'd3.4xlarge', 'g4dn.8xlarge', 'm5dn.8xlarge', 'g5.8xlarge', 'c6a.16xlarge', 'c5a.16xlarge', 'c6i.16xlarge', 'c5ad.16xlarge', 'd2.4xlarge', 'c6id.16xlarge', 'i2.4xlarge', 'd3en.8xlarge'] 
You can use the `scheduler_vm_types=[]` or `worker_vm_types=[]` keyword argument to specify instance types.
SultanOrazbayev commented 2 years ago

Screenshot 2022-06-01 at 15.38.22.pdf @rrpelgrim noted that despite requesting m6id, the cluster says it's t3.medium.

marcosmoyano commented 2 years ago

@SultanOrazbayev I think you are looking for worker_vm_types. Ref: https://docs.coiled.io/user_guide/tutorials/select_instance_types.html?highlight=worker_vm_types

SultanOrazbayev commented 2 years ago

Ah, I see, thank you! To avoid specifying worker_vm_types, is there a way to specify worker_cpu and worker_memory combination for getting a 32GB or 64GB machine?

FabioRosado commented 2 years ago

Yes, you can do what you did above:

To elaborate on why I'm requesting specific worker class: initially I tried just bumping up the worker_memory kwarg:

from coiled import Cluster
from distributed import Client

cluster = Cluster(
    software="taxi-xgboost", n_workers=5, worker_cpu=[2, 8], worker_memory="64GB"
)
client = Client(cluster)

But it would err with message:

InstanceTypeError: Unable to find instance types that match the specification: 
   Cores: [2, 8]  Memory: 128GB 
You can try selecting a range for the cpu or memory, for example: `cpu=[2, 8]` 
You might want to pick these instances that match your memory requirements, but have different core count. 
['x2iedn.xlarge', 'x1e.xlarge', 'r5a.4xlarge', 'r6i.4xlarge', 'r5.4xlarge', 'r5ad.4xlarge', 'r4.4xlarge', 'g3.4xlarge', 'r5d.4xlarge', 'r5n.4xlarge', 'r5b.4xlarge', 'i3.4xlarge', 'r3.4xlarge', 'r5dn.4xlarge', 'i4i.4xlarge', 'm5a.8xlarge', 'm6a.8xlarge', 'm5.8xlarge', 'm6i.8xlarge', 'm5ad.8xlarge', 'f1.2xlarge', 'm5d.8xlarge', 'h1.8xlarge', 'm6id.8xlarge', 'm5n.8xlarge', 'd3.4xlarge', 'g4dn.8xlarge', 'm5dn.8xlarge', 'g5.8xlarge', 'c6a.16xlarge', 'c5a.16xlarge', 'c6i.16xlarge', 'c5ad.16xlarge', 'd2.4xlarge', 'c6id.16xlarge', 'i2.4xlarge', 'd3en.8xlarge'] 
You can use the `scheduler_vm_types=[]` or `worker_vm_types=[]` keyword argument to specify instance types.

If you provide just worker_memory=['32GiB', "64GiB'] Coiled will chose the instances that match that range. You received the error above, because no instance matched your requirements (cpu=2,8, memory=64gb), that's why the error message shows you a list of all the instances that have your memory requirements but might have different core counts

SultanOrazbayev commented 2 years ago

Got it, thank you very much, @marcosmoyano @FabioRosado!

I will close this issue, but passing list to worker_memory might be worth a mention at the end of the docs here: https://docs.coiled.io/user_guide/tutorials/select_instance_types.html

FabioRosado commented 2 years ago

Got it, thank you very much, @marcosmoyano @FabioRosado!

I will close this issue, but passing list to worker_memory might be worth a mention at the end of the docs here: https://docs.coiled.io/user_guide/tutorials/select_instance_types.html

@scharlottej13 do you have time for this? Otherwise, I'm happy to do it

scharlottej13 commented 2 years ago

https://docs.coiled.io/user_guide/tutorials/select_instance_types.html

Thanks @FabioRosado, I'm happy to add this to the docs!