-
I might be off-topic, since this project states this is for Nvidia and HPC focused jobs.
I was just wondering if it was possible (not asking you guys to do it, but at least tell me if it is) to mo…
-
### 1. Quick Debug Information
* Kubernetes Version: v1.28
* GPU Operator Version: v24.6.1
### 2. Issue description
The Kubernetes cluster has two worker nodes and each contains four A100 GPUs…
-
**Which component are you using?**:
cluster-autoscaler
**Is your feature request designed to solve a problem? If so describe the problem this feature should solve.**:
The problem …
-
### Describe the bug
After creating a node group using a custom AMI (new feature as of 2024.9.1), I have not been able to successfully launch a JupyterLab Pod on the new image
Note that the ami_…
-
### What happened?
pod create failed
```
status:
message: 'Pod was rejected: Allocate failed due to no healthy devices present; cannot allocate unhealthy devices nvidia.com/gpu, which i…
-
### What Happened?
I am trying to create a minikube cluster with nvidia GPU using [docker driver](https://minikube.sigs.k8s.io/docs/drivers/docker/). I have followed all the instructions mentioned in…
-
Hello,
should be nice to setup cronjob name or provide fullnameoveride to prevent this error in helm chart when enable persistence 👍
forbidden,CronJob.batch "in-cluster-kube-image-keeper-registry…
-
**Is your feature request related to a problem? Please describe.**
We have a huge remote cluster of resource 7500 CPUs, 70TB of RAM and 500 GPUs. Using Liqo we have paired remote cluster to local clu…
-
Hello,
I am looking for a way to run some Machine Learning Inference within a Kubernetes cluster on Windows. Microk8s seemed to be a good solution as I saw that there was a gpu add on. I did some …
-
sky local up failed.
_Version & Commit info:_
* `sky -v`: 0.6.1
* `sky -c`: 0.6.1
this is my local_up.log
```
No kind clusters found.
Generating /tmp/skypilot-kind.yaml
Creating cl…