-
Passing a fd for the file to be shared would be preferable over passing its path on the filesystem:
- process privileges can be more easily contained, since the DBus process would no longer require an…
-
请问使用deepspeed分布式训练,编写hostfile,第一行为master是吧,运行命令deepspeed --master,这两个是不是有冲突
-
http://pi-star/admin/update_HostFile_DMRIds.php
![image](https://github.com/JTA-STAR/J-STAR/assets/22002824/b35adbb6-d67e-4be1-aa2c-53ca03ddd7cb)
-
Hi.
I installed the latest develop version of unifyfs using spack.
When I try to start the unifyfs server deamon using the following command:
``unifyfs start --share-dir=/home/[my user]/[some dir] …
-
Good afternoon all
SLURM was recently updated on HLRN (glogin). This happended 1 Feb 2024 and the update was from v23.02.7 to v23.11.3.
Since the update, the following command
```python
esm_ru…
-
**Describe the bug**
> i want to use deepspeed-fastgen for mixtral-instruct 8*7b inference on multi-node,my deployments are as follows:
```
import mii
client = mii.serve("path of mixtral-instr…
-
### System Info
2 * 4 L40s load llama2-70B, 1 model: tensorrt_llm.
using image: nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3
### Who can help?
_No response_
### Information
- […
-
Hi, in your paper you mentioned that your model is trained with 16\*V100 GPUs. I assume this suggests that your trained your model on 2\*8 GPU nodes, right?
Could you share some insights about how…
-
I run the training script in a multi-node env: training/step1_supervised_finetuning/training_scripts/multi_node/run_66b.sh
But it seems that the multi-nodes are not launched successfully and a warnin…
-
## Background information
Working on a project, one part of which runs multiple `prun` commands in parallel from multiple processes to launch multiple tasks, some of these commands with `--add-hostfi…