-
Hello! I am training the first two knowledge distillation stages of Mamba 2 on one DGX-H100x8 node, and I am experiencing train times of ~8 hours for the first stage, and ~13 hours for the second stag…
-
Hi,
I'm deploying this to Openshift 3.2 but when I run the Jenkins build the console print out is
Started by user anonymous
[Pipeline] Allocate node : Start
Still waiting to schedule task
Waiting f…
-
Hello, we are kaist_8. We find the issue when running the baseline.
Here is the error that we are facing now:
```NVIDIA Release 19.05 (build 6390160)
TensorFlow Version 1.13.1
Container image …
-
## 🚀 Feature
support python sub-interpreters and maintains all status of the torch library.
## Motivation
as #10950 demonstrates, the current ``torch`` library cannot lives on multiple sub-inte…
-
We are beginning a migration to a new approach to Magick. This includes a full rewrite of our core agent library, with a design consideration for developer usage and consumption. We are wrapping thi…
-
**Description**
I was using a swarm cluster with 2 nodes, a linux and a windows node, to create a service on each node, which would communicate through the swarm overlay network. On the first try…
-
We have multiple k8s cluster, is it possible to utilize single or way to use coroot across clusters?
-
We have performed a number of performance tests of a sampling of our
executables, and we have found a couple of salient features:
- When scaling from 1 core to a full node, our parallel efficiency (…
-
Very nice project and appreciate your contribution!
I have seen the deepspeed config and I want to confirm the current training strategy. For LLaMA-13B, the training uses Zero-3 optimization, check…
-
[Please check the FAQ](https://github.com/FoxxMD/multi-scrobbler/blob/master/docsite/docs/FAQ.md) before submitting a bug report.
**Describe the bug**
YouTube Music stopped authenticating as a sou…