Closed rsignell-usgs closed 1 year ago
---------------------------------------------------------------------------
ClusterCreationError Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 cluster = coiled.Cluster(use_magic=True)
File /home/conda/users/af5bdbe0a7790d697b1783344c0610cbf9663446e8be97297bef4d120b77968e-20220715-153627-084835-84-pangeo/lib/python3.9/site-packages/coiled/_beta/cluster.py:353, in ClusterBeta.__init__(self, name, software, n_workers, worker_class, worker_options, worker_vm_types, worker_cpu, worker_memory, worker_disk_size, worker_gpu, worker_gpu_type, scheduler_class, scheduler_options, scheduler_vm_types, scheduler_cpu, scheduler_memory, asynchronous, cloud, account, shutdown_on_close, use_scheduler_public_ip, credentials, timeout, environ, tags, backend_options, show_widget, configure_logging, wait_for_workers, use_magic)
351 if self.cluster_id:
352 log_cluster_debug_info(self.cluster_id, self.account)
--> 353 raise e.with_traceback(None)
354 except KeyboardInterrupt as e:
355 error = e
ClusterCreationError: Cluster status is error (reason: Scheduler Stopped -> Process never phoned home) (cluster_id: 40300)
Looks like nwis
was the culprit for the instances not phoning home:
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: ERROR: Could not find a version that satisfies the requirement nwis==0.0.* (from versions: 0.1.0, 0.1.1)
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: ERROR: No matching distribution found for nwis==0.0.*
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: Traceback (most recent call last):
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: File "/opt/bootstrap_env.py", line 37, in <module>
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: main()
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: File "/opt/bootstrap_env.py", line 33, in main
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: subprocess.check_call(cmd)
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: File "/opt/conda/envs/coiled/lib/python3.9/subprocess.py", line 373, in check_call
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: raise CalledProcessError(retcode, cmd)
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: subprocess.CalledProcessError: Command '['micromamba', 'run', '-n', 'base', 'pip', 'install', '--pre', '--no-dependencies', 'nwis==0.0.*', 'zeep==4.1.*', 'alabaster==0.7.*', 'pamela==1.0.*', 'textwrap3==0.9.*', 'click-plugins==1.1.*', 'funcsigs==1.0.*', 'eofs==1.4.*', 'pyepsg==0.4.*', 'requests-file==1.5.*', 'kbatch==0.4.*', 'noaa-coops==0.1.*', 'httpx==0.23.*', 'async-generator==1.10.*', 'ansiwrap==0.8.*', 'kubernetes==24.2.*', 'datashape==0.5.*', 'certipy==0.1.*', 'properscoring==0.1.*', 'rfc3986==1.5.*', 'httpcore==0.15.*', 'h11==0.12.*', 'requests-toolbelt==0.9.*', 'python-dateutil==2.8.*', 'sphinxcontrib-jsmath==1.0.*']' returned non-zero exit status 1.
OK, this looks like a case where the magic software environments failed in a way that was hard to interpret.
I can't tell why it's trying to install nwis==0.0.*
. I tried with a local pip install nwis
and it correctly identified the desired version as nwis==0.1.*
.
@rsignell-usgs do you know if there was anything unusual about how nwis
was installed locally?
Currently it's just dropping packages where the version cannot be parsed. The list of packages being dropped though should be fine (and will be pulled in as they look like dependencies of your dependencies).
As @ntabris identified the nwis
pacakge having an odd version number is the issue here
From my perspective as a user it wasn't clear that this was what was happening. My sense is that communicating dropped packages is going to get interesting. Lots of times I don't care at all about these packages. This time it was important to communicate. I'm not sure when it's good or bad to be vocal, but my sense is that it'll be important to figure out.
Yup. There's some UX work to do there!
I'm really confused about why it was trying to do something with nwis
. It doesn't seem to be in the environment I was using.
Does it show up in the output of pip list
?
Doh! Thanks @shughes-uk ! Indeed! I removed it and will try again
Any luck?
@rsignell-usgs curious if you've had a chance to try a more recent version of coiled where this option is called package_sync
?
This should be long resolved
cc @shughes-uk