-
**1. Requester Information:**
- PI's Full Name: Marouane Temimi
- PI's Affiliated Institute: Stevens Institute of Technology
- PI's Affiliated Email Address: mtemimi@stevens.edu
**2. Project I…
-
As part of our project pythia grant (https://github.com/2i2c-org/meta/issues/769 has more information), we keep an eye on how we can better support running infrastructure on [Jetstream2](https://jetst…
-
### What happened?
Enable integrations metricsServer, but kubectl top node does not work properly in vCluster.
### What did you expect to happen?
kubectl top node works normally.
### H…
wutz updated
2 months ago
-
## Description
We want to offer users better support with deploying Kedro.
* Production deployment is important to drive value; we're at least a couple years past when people were happy just running …
-
We should consider adding support for AMD GPUs, which have been tested to be efficient for ML workloads.
References:
https://www.amd.com/en/technologies/deep-machine-learning
https://www.lamini.…
-
We are integrating OAP MLlib into Cloud. Encountered the executors restart issue, although the result still outputted , when run Hibench Kmeans workload on AWS EKS environment, with Hibench small sca…
-
There have been some conversations going on around how to improve communication among the dev team, to be able to prioritize work and actually plan for and execute larger plans. The idea is to *reduce…
-
### Search before asking
- [X] I had searched in the [issues](https://github.com/ray-project/kuberay/issues) and found no similar feature requirement.
### Description
Ray jobs automatically templa…
-
We need to be able to run tests on pull requests / periodically as well as provide an automated build & promotion process for images
/assign @figo @codenrhoden
-
**Environment**:
- 4-node GH200 Cluster (Vanilla Kubernetes 1.31)
- Bluefield-3 DPU in each node & SN3700 Switch (Cumulus Linux)
- GPU Operator Deployed
Attempting to deploy Network Operator v…