-
### System Info / 系統信息
![Uploading 5.PNG…]()
### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [X] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from s…
-
#415
#417
-
Our company has a distributed file system. We want to integrate GDS functions to accelerate AI training. How can we integrate GDS? Is there a document that we can refer to?
```[tasklist]
### Tasks
- […
-
**Describe the bug**
A clear and concise description of what the bug is.
deepspeed tries to call "hostname -I" which is not a valid flag for hostname. it should be "hostname -i"
**To Reproduce**
…
-
@simone-silvestri can I convince you to rewrite this section with updated benchmarks, and include results for distributed systems?
https://github.com/CliMA/Oceananigans.jl?tab=readme-ov-file#perfor…
-
Hi kevin,
We are trying to implement the HA active/active.as you mentioned here https://github.com/jupyter/enterprise_gateway/issues/562#issuecomment-458203989 we are able to handle the scenario3" …
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
### Exp…
-
### Description
Hi,
`jax.jit` on a function seems to fail when running in an OpenMPI environment. An MWE is shown below:
```python
# error.py
# Run as: mpirun -n 8 python error.py
import…
-
### Your current environment
```tex
The environment is the latest vllm-0.5.4's docker environment, and the command to run is:python3 api_server.py --port 10195 --model /data/models/Mistral-Large-Ins…
-
Hello,
I installed rtdetrv2_pytorch with requirements text but It didn't work in the below code
`CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/…