-
Bulk jobs interact with many tunable cluster settings. Some of these have public and/or internal advice to tune them. This documentation may be outdated and should be audited and updated. Further, som…
-
https://github.com/facebookresearch/encodec/blob/0e2d0aed29362c8e8f52494baf3e6f99056b214f/encodec/quantization/core_vq.py#LL220C18-L220C18
I have found that expiration handling for codebook did not…
-
The Slurm_lapply function fails, reporting job ids of NA, when run on a federated SLURM cluster. In this case the parallel slurm jobs were successfully started, but the parent process failed to parse …
-
A (somewhat) related question regarding the testCV function graphical results. I don't understand what I'm looking at, quite frankly.
I run it as such:
cvs
-
### Description
**What problem are you trying to solve?**
We are running custom Karpenter implementation with k3s
We would like to extend to have one Karpenter handling multi region support i…
-
### What happened + What you expected to happen
I wanted to restore checkpoints created with ray v2.34.0 with ray v2.35.0, which errors with
```
>>> from ray.rllib.algorithms import Algorithm
…
-
I tried to deploy the simple app "https://github.com/rancher/fleet-examples/tree/master/simple" to one of my clusters (running on GKE). Unfortunalty it always stucks in "waitApplied", and it does not…
wofr updated
5 months ago
-
**Is your feature request related to a problem? Please describe.**
For IVF-Flat ad IVF-PQ index building, large datasets are provided in host memory or as `mmap`-ed file. After the cluster centers ar…
-
Hello and thanks for the great software!
I run Sargasso on an HPC which uses the Slurm job scheduler.
I notice that when I run a batch job using sbatch, the jobs exit prematurely.
When I run in int…
-
#### Problem Description
If `default` is used as a single partition name in `slurm.yaml` (under `elastic_partitions:`), the `slurmctld` controller fails to start. `/var/log/slurm/slurmctld.log` sug…