-
can I use tevatron to train models in multi-node multi-card environment ?
if yes, could you please give script examples to demonstrate how to start the job, thank you
-
Tagential to #3729
`kind export logs` results in ssh handshake errors when running under the podman remote client (i.e. podman on Windows and macOS). The easiest way to reproduce is to create a 4-n…
-
In a multi tenant cluster, Restore's distSQL processors are assigned to sql instances using the `sqlInstanceID`. Currently, the `splitAndScatterProcessor` routes a scattered range to a sql instance r…
-
Testing transactions in a small multi region cluster, and have noticed that in a such case the time to complete a transaction is substantial.
A simplest way to illustrate the problem is to start up a…
-
**Rancher Server Setup**
- Rancher version: v2.7.1
- Installation option (Docker install/Helm Chart): Docker
**Information about the Cluster**
- Kubernetes version: v1.24.10-rancher4-1
- Cluste…
-
Looking at getInstanceByName https://github.com/kubernetes/kubernetes/blob/release-1.8/pkg/cloudprovider/providers/gce/gce_instances.go#L461, we do a very inefficient search for instances by name. Th…
-
I have three Harvester clusters in three data centers. Would like to be able to create a RKE2 cluster on top of all of them using master Rancher cluster.
This is possible if you select the harveste…
-
Some node labels in the config file (such as node-role.kubernetes.io/worker) cause `kind create cluster` to fail with an unfriendly error
I attempted to use this feature to set a mea…
-
### What happened + What you expected to happen
# What happened
I ran an experiment with 2 T4 GPUs on GCP using `PB2` for 500 iterations. In nearly the middle of the the experiment almost all tria…
-
This is not exactly a bug report or a feature request, but more of a discussion.
We will have a fairly large set of databases in a multi-tenant situation. Let's say for sake of argument that we wil…