-
For testing purporses, I tried deploying longhorn into a `kind` multi-node cluster.
longhorn started crashlooping, because `iscsi` isn't available.
I'm a bit confused - the docs only say:
> L…
-
Click to expand!
### Issue Type
Bug
### Source
binary
### Tensorflow Version
tf 2.5 or tf 2.8
### Custom Code
No
### OS Platform and Distribution
Centos 72.
### Mobile device
_No respon…
-
```
// Use CUDNN library
USE_CUDNN:BOOL=OFF
// Use MPI library
USE_MPI:BOOL=ON
// Use NCCL library
USE_NCCL:BOOL=OFF
// Download and compile SentencePiece
USE_SENTENCEPIECE:BOOL=OFF
/…
-
### 🚀 The feature, motivation and pitch
Currently, distributed inference (TP) in vLLM relies on ray to orchestrate the gpu workers. I briefly check the code and seems the core distributed communica…
-
## Overall Goal
For LibreCores CI, we want to give users the ability to easily hook their machines into the CI system to run CI _on their own project_.
## Why?
- Users have the required tools setup (…
-
Are there any plans to draw up a spec for the various communication elements (particularly service discovery)? To perhaps allow implementations in other languages? A key aspect of micro-services for m…
-
## Motivation
Expand Pytroch C10D backend to allow dynamic loading non-built-in communication libraries, as a preparation step to integrate Intel CCL (aka MLSL) to Pytorch as another c10d backend fo…
-
- Content is:
[ ] Coding tutorial
[ ✓] Informational content
[ ] Other (describe)
- Which blockchain protocol is this about?:
The article outlines best practices for securing blockchain a…
-
### Target SharePoint environment
SharePoint Online
### What SharePoint development model, framework, SDK or API is this about?
other (enter in the "Additional environment details" area below)
###…
kstat updated
3 years ago
-
Hello NCCL Developers,
I have a question regarding the topology connections between multiple machines in NCCL. Currently, when NCCL communicates across multiple machines, does each individual machine…