-
XGBoost for Python depends on `nvidia-nccl-cu12`, which is for CUDA 12. I have a PyTorch 2.4.0 installation for CUDA 11.8, but when I use distributed mode, PyTorch picks up on the one installed by XGB…
-
### 🐛 Describe the bug
```python
import torch
torch.distributed.init_process_group(backend="mpi")
nccl_group = torch.distributed.new_group(backend="nccl")
```
```
[rank0]: Traceback (most r…
-
Hi,
I tried to run your code by following the readme instructions.
When I tried the main experiment [ColBERT Retrieval], and run the script:
python src/main.py configs/nq_tables/colbert.jso…
-
-
### 🐛 Describe the bug
When running
python test/distributed/test_c10d_nccl.py -k test_nan_assert_float16 on a H100x2 platform,
the current nightly (and likely v2.5.0 RC) is producing the follo…
-
I usage: `python train.py -c configs\ljs_base.json -m ljs_base`
output:
```
DEBUG:numba.core.byteflow:bytecode dump:
> 0 NOP(arg=None, lineno=1051)
2 LOAD_FAST(arg=0, l…
Vubni updated
8 months ago
-
We need to add NCCL support as backend/implementation of Communicator abstraction, which will provide all required functionality for synchronous distributed SameDiff training
-
Is this a bug?
Training using NCCL with 2 gpus 1080 and 1060 and a LevelDB Data Layer?
When using single GPU this does not happen.
It naively appears to me that the levelDB is trying to be open…
-
{
"platform":"",
"hub-mirror": [
ghcr.io/coreweave/nccl-tests:12.2.2-cudnn8-devel-ubuntu22.04-nccl2.19.3-1-85f9143
]
}
-
### Description
When the same GPU tensor is sent to multiple readers, we should use ncclBroadcast under the hood to reduce transfer time.
### Use case
_No response_