-
I have been asked by my employer to install box64 for our developers to use. I do not use it myself.
When doing so I discovered that its build system attempts to install some x86 libraries that it …
-
## 🐛 Bug
When I run two worker multiprocess training with xmp.spawn, where one rank waits at rendezvous while another hits assertion, I would see a hang at the assertion:
```
(pt25dev_env) ubun…
-
I have the following error when running the bash; any hints to solve this issue?
`bash pretrain_tiny.sh `
```
[2024-08-22 09:38:07,910] torch.distributed.elastic.multiprocessing.redirects: [WA…
-
This issue proposes the development of a decentralized version of Apache Answer. The goal is to leverage distributed ledger technologies and peer-to-peer systems to create a knowledge-sharing platform…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
I trying to deploy a Qwen2-72b model in k8s, with 4 GPUs in one node. Accroding…
-
-
Thank you for the interesting works.
Can you describe the required computer specs for running the inference?
I tried to run the inference, but I kept getting device ordinal error.
`LOCAL_RANK` …
-
**Describe the issue**:
While experimenting with `dd.read_parquet(..., filesystem="arrow")`, I noticed that I get a strange error whenever `distributed` hasn't been imported beforehand. I'm not sur…
-
## Background information
### What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v5.0.3
### Describe how Open MPI was installed (e.g., from a sourc…
-
Hi,
it's seems that there is a minor performance issue when initialized pytorch distribution with shared file(https://pytorch.org/docs/stable/distributed.html#shared-file-system-initialization).
…