hostfile Search Results

1000+ results
for hostfile

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/nccl-tests #236

2 Node Nccl Test don’t work

I have two servers, Dell and FusionServer, nccl-test don't work ,but if all servers is same model,the ncct-test can work my environment ``` os: ubuntu 22.04 cuda: 12.4 NV drvier: 550 ``` wh…

SdEnd updated 2 months ago
7
evanberkowitz/metaq #16

Feature Request: split hostfile for tasks in a job

MVAPICH requires the passing of a hostfile at the mpirun launch. If you are bundling tasks, currently, you give them all the complete hostlist for the job. Would it be possible to have METAQ split t…

walkloud updated 5 years ago
2
microsoft/DeepSpeed #2958

[REQUEST]How to deploy multi-nodes training without hostfile…

**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] I cant get the IP for each node before…

SefaZeng updated 1 year ago
4
JTA-STAR/J-STAR #37

curl: (6) Could not resolve host: www.bi7jta.cn

http://pi-star/admin/update_HostFile_DMRIds.php ![image](https://github.com/JTA-STAR/J-STAR/assets/22002824/b35adbb6-d67e-4be1-aa2c-53ca03ddd7cb)

bi7jta updated 5 months ago
1
bol-van/zapret #427

Требуется поддержка нескольких разных fake пакетов в одном п…

Есть ли возможность в zapret (интересует в основном nfqws) отсылать несколько разных фейков (явно заданных) в определенной последовательности, на подобии как в последнем GoodbyeDPI задание нескольких …

nb557 updated 1 week ago
2
OvJat/DeepSpeedTutorial #2

请教

请问使用deepspeed分布式训练，编写hostfile，第一行为master是吧，运行命令deepspeed --master，这两个是不是有冲突

elesun2018 updated 1 month ago
2
esm-tools/esm_tools #1212

Bug in distribution of processors at ECMWF atos

A bug in the esm-tool derived distribution of processors at ECMWF atos has been detected by Paul Dando from ECMWF after I reported extremely slow execution times of AWI-CM 3.1 on ECMWF-atos machine. …

tsemmler05 updated 1 month ago
21
ucsf-wynton/wynton-website-hpc #110

SGE: Add qrsh example how to launch multi-host subprocesses

A user said in an email: > My code doesn't actually use openMPI for communication (and none is needed for single gpu jobs), the only reason I use `mpirun ` is because it's the only way (afaik) to i…

HenrikBengtsson updated 1 year ago
3
NVIDIA/nccl #1480

The nsys profile will hang when NCCL_P2P_USE_CUDA_MEMCPY is …

I am using the Nsight system tool to observe the behavior of allreduce_perf on a server with 8 H800 gpus. I found that when the NCCL_P2P_USE_CUDA_MEMCPY function is enabled, the nsys profile command w…

PhdShi updated 3 days ago
5
triton-inference-server/tensorrtllm_backend #355

server fails in Stuck when using pipeline parallel in multi-…

### System Info 2 * 4 L40s load llama2-70B, 1 model: tensorrt_llm. using image: nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3 ### Who can help? _No response_ ### Information - […

hezeli123 updated 2 months ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for hostfile

1000+ results
for hostfile