-
Encountered this before, and saw this in `test_tpu_vm` smoke test:
```
...
+ sky logs test-tpu-vm-zongheng-4edc-eb 1 --status
Job 1 SUCCEEDED
+ sky stop -y test-tpu-vm-zongheng-4edc-eb
Stoppin…
-
Hi Team,
How to cite this wonderful tool in our manuscripts? Zenodo or some paper?
Thanks!
-
```
> sky launch -y -c test-huggingface-9ce1ce58-61 examples/huggingface_glue_imdb_app.yaml
[?25hTraceback (most recent call last):
File "/Users/zhwu/miniconda3/envs/sky-dev/bin/sky", line 33, i…
-
When I tried to start a multi-node gcp cluster, during my debugging, I encountered the following error, which may indicate the GCP has some per minute request limit quota.
```
raceback (most recen…
-
By default we attach accelerators to an `n1-highmem-8` instance, but this is not suitable for A100s. We should find a way to add this information to our optimizer or service catalog.
`sky gpunode -…
-
When provisioning spot 8xA100 and it's unavailable, Sky failed immediately and didn't fail over to other regions.
The reason is Sky only handles GCP return code `ZONE_RESOURCE_POOL_EXHAUSTED` but no…
-
Both `HTTP-SECURE-JSON` and `HTTPS-SECURE-JSON` interface names are present in the code.
Service registry request should match the interfaces listed in `service_interface` table.
https://github.com…
-
We need to catch errors from each `step.run()`.
https://github.com/concretevitamin/sky-experiments/blob/3e9bac359da41187060b348be48a6400704f25aa/prototype/sky/execution.py#L169
Apparently `ray up`…
-
### Search before asking
- [X] I searched the [issues](https://github.com/ray-project/ray/issues) and found no similar issues.
### Ray Component
Ray Clusters
### Issue Severity
Medium: It contri…
-
`multi_echo.py` was using the cluster name `multi_echo`. When forcing it on GCP, it failed with
> googleapiclient.errors.HttpError:
Replacing the name with `multi-echo` worked. It must be reg…