coiled / feedback

A place to provide Coiled feedback
15 stars 3 forks source link

Packages that should not be omitted are omitted with use_magic=True #177

Closed rsignell-usgs closed 1 year ago

rsignell-usgs commented 2 years ago
cluster = coiled.Cluster(use_magic=True)
Package openssl is not available for linux-64 and has been omitted
Package tzcode is not available for linux-64 and has been omitted
Package nitro is not available for linux-64 and has been omitted
Package jpeg is not available for linux-64 and has been omitted
Package x264 is not available for linux-64 and has been omitted

cc @shughes-uk

rsignell-usgs commented 2 years ago
---------------------------------------------------------------------------
ClusterCreationError                      Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 cluster = coiled.Cluster(use_magic=True)

File /home/conda/users/af5bdbe0a7790d697b1783344c0610cbf9663446e8be97297bef4d120b77968e-20220715-153627-084835-84-pangeo/lib/python3.9/site-packages/coiled/_beta/cluster.py:353, in ClusterBeta.__init__(self, name, software, n_workers, worker_class, worker_options, worker_vm_types, worker_cpu, worker_memory, worker_disk_size, worker_gpu, worker_gpu_type, scheduler_class, scheduler_options, scheduler_vm_types, scheduler_cpu, scheduler_memory, asynchronous, cloud, account, shutdown_on_close, use_scheduler_public_ip, credentials, timeout, environ, tags, backend_options, show_widget, configure_logging, wait_for_workers, use_magic)
    351     if self.cluster_id:
    352         log_cluster_debug_info(self.cluster_id, self.account)
--> 353     raise e.with_traceback(None)
    354 except KeyboardInterrupt as e:
    355     error = e

ClusterCreationError: Cluster status is error (reason: Scheduler Stopped -> Process never phoned home) (cluster_id: 40300)

https://cloud.coiled.io/rsignell/clusters/40300/details

ntabris commented 2 years ago

Looks like nwis was the culprit for the instances not phoning home:

Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: ERROR: Could not find a version that satisfies the requirement nwis==0.0.* (from versions: 0.1.0, 0.1.1)
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: ERROR: No matching distribution found for nwis==0.0.*
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: Traceback (most recent call last):
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]:   File "/opt/bootstrap_env.py", line 37, in <module>
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]:     main()
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]:   File "/opt/bootstrap_env.py", line 33, in main
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]:     subprocess.check_call(cmd)
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]:   File "/opt/conda/envs/coiled/lib/python3.9/subprocess.py", line 373, in check_call
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]:     raise CalledProcessError(retcode, cmd)
Jul 15 16:37:45 ip-10-11-13-222 cloud-init[1266]: subprocess.CalledProcessError: Command '['micromamba', 'run', '-n', 'base', 'pip', 'install', '--pre', '--no-dependencies', 'nwis==0.0.*', 'zeep==4.1.*', 'alabaster==0.7.*', 'pamela==1.0.*', 'textwrap3==0.9.*', 'click-plugins==1.1.*', 'funcsigs==1.0.*', 'eofs==1.4.*', 'pyepsg==0.4.*', 'requests-file==1.5.*', 'kbatch==0.4.*', 'noaa-coops==0.1.*', 'httpx==0.23.*', 'async-generator==1.10.*', 'ansiwrap==0.8.*', 'kubernetes==24.2.*', 'datashape==0.5.*', 'certipy==0.1.*', 'properscoring==0.1.*', 'rfc3986==1.5.*', 'httpcore==0.15.*', 'h11==0.12.*', 'requests-toolbelt==0.9.*', 'python-dateutil==2.8.*', 'sphinxcontrib-jsmath==1.0.*']' returned non-zero exit status 1.
mrocklin commented 2 years ago

OK, this looks like a case where the magic software environments failed in a way that was hard to interpret.

ntabris commented 2 years ago

I can't tell why it's trying to install nwis==0.0.*. I tried with a local pip install nwis and it correctly identified the desired version as nwis==0.1.*.

@rsignell-usgs do you know if there was anything unusual about how nwis was installed locally?

shughes-uk commented 2 years ago

Currently it's just dropping packages where the version cannot be parsed. The list of packages being dropped though should be fine (and will be pulled in as they look like dependencies of your dependencies).

As @ntabris identified the nwis pacakge having an odd version number is the issue here

mrocklin commented 2 years ago

From my perspective as a user it wasn't clear that this was what was happening. My sense is that communicating dropped packages is going to get interesting. Lots of times I don't care at all about these packages. This time it was important to communicate. I'm not sure when it's good or bad to be vocal, but my sense is that it'll be important to figure out.

shughes-uk commented 2 years ago

Yup. There's some UX work to do there!

rsignell-usgs commented 2 years ago

I'm really confused about why it was trying to do something with nwis. It doesn't seem to be in the environment I was using.

shughes-uk commented 2 years ago

Does it show up in the output of pip list ?

rsignell-usgs commented 2 years ago

Doh! Thanks @shughes-uk ! Indeed! I removed it and will try again

mrocklin commented 2 years ago

Any luck?

phobson commented 2 years ago

@rsignell-usgs curious if you've had a chance to try a more recent version of coiled where this option is called package_sync?

shughes-uk commented 1 year ago

This should be long resolved